How can programmers ensure that software is bug-free?

Author | Daniel Lemire Translator | Meniscus

Listing | CSDN (ID: CSDNnews)

8248a2a9ef6829da0b3d498c4fa3d648.png

Our primary goal in writing software is to make sure it is correct. The software must do what the programmer wants it to do, and must meet the needs of the user.

Double-entry bookkeeping is used for business operations, and transactions need to be recorded in at least two accounts: debit and credit. One of the advantages of double-entry bookkeeping over more primitive methods is that it enables a degree of auditing and finding errors. If we compare accounting to software programming, we can think of software testing as the equivalent of double-entry accounting and its follow-up audit.

Converting an original accounting system to a double-entry accounting system is often a daunting task for accountants. In many cases, accountants need to rebuild their books from scratch. Also, adding tests can be very difficult for a large application that has been developed without any tests at all. This is why when building software, testing should be the first consideration.

An eager programmer, or a novice programmer, might write a routine quickly, compile it, run it, and see that the result is correct. But prudent or experienced programmers understand that routines should not be assumed to be correct.

1. Common software errors

Common software bugs can cause programs to terminate abruptly or even corrupt databases. The consequences could be dire: In 1996, a software bug caused the explosion of the Ariane-5 launch vehicle. This error is caused by the conversion of a floating point number to an integer, which is a 16-bit signed integer that can only represent small integer values. The integer cannot represent a floating point number, and the program stopped when it detected this unexpected error. Ironically, the function that triggers this bug is not required, it was just integrated as a subsystem from earlier Ariane rocket models. At 1996 prices, the cost of the error was about $400 million.

The importance of producing the right software has long been known. Good scientists and engineers have been working hard for decades.

Common strategies for ensuring correctness include the following. For example, if we want to do a complex scientific calculation, then we need to set up several independent teams to calculate the answer. If all teams arrive at the same answer, it can be concluded that this answer is correct. This redundancy strategy is often used to guard against hardware-related failures. Unfortunately, writing multiple versions of software is often impractical.

Many programmers have advanced math education. They want us to prove a program is correct. Hardware failures aside, we have to make sure that the software doesn't encounter any bugs. In fact, today's software is so mature that we can prove programs are correct.

Below, we illustrate our point with an example. We can use Python's z3 library. Non-Python users don't worry, you don't have to actually run this example.

First, we run the command pip install z3-solver to install the necessary libraries. Suppose we need to ensure that the inequality ( 1 + y ) / 2 < y holds for all 32-bit integers. We can use the following script:

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) / 2 >= y )
if(s.check() == z3.sat):
    model = s.model()
    print(model)

In this example, we construct a 32-bit word BitVec to represent our example integer. By default, the z3 library interprets this variable as an integer value between -2147483648 and 2147483647, which is -~ (including - and). We enter the inequality: ( 1 + y ) / 2 >= y (note: the opposite of the inequality we wish to verify). If z3 does not find a counterexample, then the inequality ( 1 + y ) / 2 < y holds.

When running the script, Python displayed an integer value of 2863038463, indicating that z3 found a counterexample. The z3 library always gives a positive integer, and we can only decide how to interpret the result, for example the number 2147483648 should be interpreted as -2147483648, 2147483649 should be interpreted as -2147483647 and so on. This representation is often called two's complement. So the number 2863038463 should actually be understood as a negative number. However, it doesn't matter what the exact value is, what matters is that our inequality ( 1 + y ) / 2 < y does not hold when the variable is negative. We can simply verify that by assigning -1 to the variable, the result is: 0 < -1. This inequality also does not hold when the variable is assigned the value 0: 0 < 0. In addition, we can also check whether the inequality holds when the variable is assigned a value of 1. To do this, we need to add a condition where the variable is greater than 1 ( s.add( y > 1 )):

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) / 2 >= y )
s.add( y > 1 )


if(s.check() == z3.sat):
    model = s.model()
    print(model)

After the modification, the script displayed nothing when executed, so we can conclude that this inequality holds as long as the variable variable is greater than 1.

Now that we've proven that the inequality ( 1 + y ) / 2 < y holds, does the inequality ( 1 + y ) < 2 * y also hold? Let's try it out:

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) >= 2 * y )
s.add( y > 1 )


if(s.check() == z3.sat):
    model = s.model()
    print(model)

After the script runs, it shows 1412098654, which is half of 2824197308, and we need to interpret this result for z3 as a negative value. To avoid this problem, let's add a new condition so that the value of the variable can still be interpreted as a positive value after multiplying by 2:

import z3
y = z3.BitVec("y", 32)
s = z3.Solver()
s.add( ( 1 + y ) / 2 >= y )
s.add( y > 0 )
s.add( y < 2147483647/2)


if(s.check() == z3.sat):
model = s.model()
print(model)

This time the result was confirmed. As shown above, even in relatively simple cases, this formalized approach requires a lot of work. In the early days of computer science, computer scientists might have been optimistic, but by the 1970s Dijkstra et al. were skeptical:

We have seen that automatic program verifiers can quickly reach their processing limits when verifying very small programs, even on relatively fast machines, even if they can perform many parallel processes at the same time. But even so, we still have to wonder, is the verification result really correct? Sometimes I think...

Applying such mathematical methods on a large scale is impractical. Errors come in many forms, and not all errors can be represented concisely and concisely in mathematical form. Even if we can accurately represent the problem in mathematical form, we can't believe that a tool like z3 alone will find a solution, and as the problem gets harder, the computation takes longer and longer. In general, an empirical approach is more appropriate.

2. It is necessary to test the software

Over time, programmers gradually understand the need to test software. But not all code needs to be tested, and often prototypes or examples do not require further validation. However, any significant functionality designed to be implemented in a professional environment should be at least partially tested. Testing can reduce the likelihood of having to face a catastrophic situation in the future.

There are two main types of common tests.

  • unit test. These are designed to test specific components of a software program. For example, unit tests for a single function. In most cases, unit tests are done automatically, and programmers can execute them simply by pressing a button or entering a command. Unit tests often avoid acquiring valuable resources, such as creating large files on disk or establishing network connections. Unit tests usually do not involve the setup of the operating system.

  • Integration Testing. These are designed to verify complete applications. Often these tests require access to the network and sometimes large amounts of data. Integration testing sometimes requires human intervention and also requires application-specific knowledge. Integration testing may require setting up the operating system and installing software. Integration tests can also be automated, at least partially. In most cases, integration tests are based on unit tests.

Unit testing is usually done as part of continuous integration. Continuous integration often automates specific tasks, including unit testing, backups, applying cryptographic signatures, and more. Continuous integration can be performed periodically or when code changes.

Unit testing can be used to establish the software development process and guide software development. These tests can be written before writing the code itself, also known as "Test Driven Development". Typically, tests are written after feature development is complete. Writing unit tests and developing functionality can be done by different programmers. Sometimes it's easier to spot bugs with tests provided by other developers because they might make different assumptions.

We can integrate tests into some function or application. For example, the application runs some tests on startup. In this case, the tests become part of the distributed code. However, it is more common practice to not expose unit tests. After all, unit testing is only for programmers and does not affect the functionality of the application. In particular, they do not pose a security risk and do not affect the performance of the application.

3. Test coverage is not a good indicator of test quality

Experienced programmers often consider testing to be as important as code. So it's not uncommon to spend half your work time writing tests. While it affects the speed at which you can write code, testing is an investment in the long run, so it often saves time. Often software that is not well tested will be harder to update. Testing can reduce uncertainty about code changes or extensions.

Tests should be easy to read, simple and fast to run, and not use a lot of memory.

However, it is difficult to precisely define the quality of testing. There are several common statistical methods. For example, we can count the number of lines of code covered by the test. Here, we have to talk about test coverage. 100% coverage means that all code is tested. In practice, however, coverage is not a good indicator of test quality.

Let's take a look at the following example:

package main


import (
    "testing"
)




func Average(x, y uint16) uint16 {
   return (x + y)/2
}


func TestAverage(t *testing.T) {
    if Average(2,4) != 3 {
       t.Error(Average(2,4))
    }
}

In Go language, we can use the command go test to run tests. The above code is tested accordingly for the Average function. For the above example, the test runs very successfully with 100% coverage.

However, the correctness of the Average function may not meet our expectations. If the parameter passed in is an integer (40000, 40000), then we expect the average returned to be 40000. But the addition of two integers 40000 cannot be represented by a 16-bit integer (uint16), so the result will become (40000+4000)%65536=14464. So this function will return 7232. Feeling a little surprised? The following test will fail:

func TestAverage(t *testing.T) {
if Average(40000,40000) != 40000 {
t.Error(Average(40000,40000))
}
}

If possible, and fast enough, we can try to test this code more exhaustively, such as in the example below where we use a few more values:

package main


import (
    "testing"
)




func Average(x, y uint16) uint16 {
   if y > x {
     return (y - x)/2 + x
   } else {
     return (x - y)/2 + y
   }
}


func TestAverage(t *testing.T) {
  for x := 0; x <65536; x++ {
    for y := 0; y <65536; y++ {
      m :=int(Average(uint16(x),uint16(y)))
      if x < y {
        if m < x || m> y {
         t.Error("error ", x, " ", y)
        }          
      } else {
        if m < y || m> x {
         t.Error("error ", x, " ", y)
        } 
      }
    }
  }
}

In practice, we rarely do exhaustive testing. Usually we use pseudo-random testing. For example, we can generate pseudo-random numbers and use them as parameters. In random testing, it is important to remain deterministic, i.e. use the same values ​​for each test run. To do this, we can provide a fixed seed to the random number generator, as in the following example:

package main


import (
   "testing"  
       "math/rand"
)




func Average(x, y uint16) uint16 {
   if y > x {
     return (y - x)/2 + x
   } else {
     return (x - y)/2 + y
   }
}


func TestAverage(t *testing.T) {
  rand.Seed(1234)
  for test := 0; test <1000; test++ {
    x := rand.Intn(65536)
    y := rand.Intn(65536)
    m :=int(Average(uint16(x),uint16(y)))
    if x < y {
      if m < x || m> y {
       t.Error("error ", x, " ", y)
      }          
    } else {
      if m < y || m> x {
       t.Error("error ", x, " ", y)
      } 
    }
  }
}

Testing based on random exploration is part of a strategy commonly referred to as "fuzzing".

Our tests can generally be divided into two categories, namely forward testing and reverse testing. Forward testing aims to verify that a function or component behaves as agreed. The first test of the Average function above is the forward test. Backtesting examines whether the software works correctly in unexpected situations. We can perform reverse testing by providing random data (fuzzing). If the above program can only handle small integer values, then our second example can be considered a reverse test.

If the code is modified, none of the above tests will pass. On this basis, we can also adopt more complex testing methods, such as randomly modifying the code, and confirming that these modifications will cause the test to fail.

Some programmers choose to automatically generate tests from the code. This approach tests the component and records the results. For example, in the above example of calculating the average, Average(40000,40000) yields 7232. If the code subsequently changes, causing the results to change, the test will fail. This approach saves time because the tests are automatically generated. We can achieve 100% test coverage quickly and easily. However, such tests can be misleading. In particular, this approach may log incorrect behavior. Furthermore, such testing only guarantees quantity, not quality. Tests that are not helpful in verifying the basic functionality of the software can even be harmful. Irrelevant tests waste programmers' time when subsequent versions change.

4. The benefits of testing

Finally, let's review the benefits of testing: testing helps us organize our workflow, testing is a measure of quality, it helps us document code, avoid regression bugs, it helps with debugging, and it helps us write more efficient code.

organize

Designing a complex piece of software can take weeks or months of hard work. Most of the time, we break down work into individual units. It's hard to judge the outcome until the final product arrives. When developing software, writing tests helps organize our work. For example, a component is not considered complete until it has been written and tested. Without the process of writing tests, it is more difficult to estimate the progress of a project because untested components may be far from complete.

quality

Tests can also show how committed programmers are to their work. We can quickly evaluate various functions and components of a software program through tests, and well-written tests show that the corresponding code is reliable. And untested features can be used as a warning.

Some programming languages ​​are very strict and can verify the code by compiling. And some programming languages ​​(Python, JavaScript) leave more freedom to programmers. Some programmers believe that tests can overcome the limitations of less restrictive programming languages ​​and impose an additional constraint on the programmer.

Documentation

Software development should generally have clear and complete documentation. In practice, however, documentation is often incomplete, inaccurate, or even wrong, or non-existent. Therefore, testing becomes the only technical specification. Programmers can read test cases and then adjust their understanding of software components and functionality. Unlike documentation, tests are generally up-to-date and very accurate if they are run regularly, since tests are written in programming languages. Therefore, the tests demonstrate how the code can be used.

Even if we want to write high-quality documentation, testing can play an important role. To illustrate computer code, we often need to use examples. Every example can be turned into a test. Therefore, we can ensure that the examples included in the documentation are reliable. If the code changes and the examples need to be modified, the process of testing the examples will remind us to update the documentation. This way, we can avoid outdated examples in the documentation that give readers a bad experience.

return

Programmers regularly fix bugs in software. The same problem can also recur for different reasons: the original problem is not fundamentally solved; a change in one part of the code causes an error to be returned elsewhere; adding a new feature or optimizing the software causes an error to be returned or a new bug appears. When a new defect occurs in software, we call it a regression problem. To prevent such regressions, an important step is to perform corresponding tests for each bugfix or new feature. Running this kind of test, we can notice regression problems as soon as they arise. Ideally, after you modify the code and run regression tests, you can find regression problems, so that regression problems can be avoided. To turn bugs into simple and effective tests, we should reduce bugs to their simplest form. For example, for the averaging example above, we can add the detected error to an additional test:

package main


import (
    "testing
)




func Average(x, y uint16) uint16 {
   if y > x {
     return (y - x)/2 + x
   } else {
     return (x - y)/2 + y
   }
}


func TestAverage(t *testing.T) {
   if Average(2,4) != 3 {
     t.Error("error1")
   }
   if Average(40000,40000)!= 40000 {
     t.Error("error2")
   }           
}

bug fix

In practice, an extensive test suite can identify and correct errors faster. This is because testing narrows the scope of errors and provides some assurance to the programmer. In a way, the time it takes to write tests can reduce the time to find bugs, while reducing the number of bugs.

Also, writing new tests is an effective strategy for identifying and correcting bugs. In the long run, this approach is more efficient than other debugging strategies such as stepping through the code. In fact, after debugging is done, in addition to fixing bugs, you should also add new unit tests.

performance

The primary role of testing is to verify that functions and components produce the expected results. However, there are also many programmers who use tests to measure the performance of components. For example, measure the execution speed of a function, the size of an executable, or memory usage. These tests are able to detect the performance penalty caused by code changes. You can compare the performance of your own code with the reference code and use statistical tests to check for differences.

5. Summary

All computer systems have flaws. Hardware can fail at any time. Even if the hardware is reliable, it is nearly impossible for programmers to predict all the situations that the software will encounter in operation. No matter who you are or how hard you work, your software won't be perfect. Nonetheless, we should do everything we can to write correct code: one that meets user expectations.

While it is possible to write correct code without writing tests, the benefits of test suites are tangible in difficult or larger projects. Many experienced programmers will refuse to use untested software components.

A good habit of writing tests can help you grow into a better programmer. As you write tests, you become more aware of human limitations. When interacting with other programmers and users, if you have a test suite, you can think better about their feedback.

Recommended book list

  • James Whittaker, Jason Arbon, Jeff Carrollo, How GoogleTests Software, Addison-Wesley Professional, 1st Edition (March 23, 2012)

  • Lisa Crispin, JanetGregory, Agile Testing: A Practical Guide for Testersand Agile Teams, Addison-Wesley Professional Publishing; 1st Edition (December 30, 2008)

Original link:

https://lemire.me/blog/2022/01/03/how-programmers-make-sure-that-their-software-is-correct/

This article has been authorized by the author, please indicate the source and source for reprinting!

Guess you like

Origin blog.csdn.net/csdnnews/article/details/124311199