Reading notes——"UTOPIA: Automatic Generation of Fuzz Driverusing Unit Tests"

  • 【参考文献】Jeong B, Jang J, Yi H, et al. UTOPIA: automatic generation of fuzz driver using unit tests[C]//2023 IEEE Symposium on Security and Privacy (SP). IEEE, 2023: 2676-2692.
  • [Note] This article is only the author's personal study notes. If there is any offense, please contact the author to delete it.

Table of contents

 Summary

1. Introduction

2. Challenges and proposed methods

1. Synthesize an effective API call sequence

2. Synthesize valid API call parameters

3. Challenges in utilizing unit testing

3. Design

1. UT frame structure analysis

2. API attribute analysis

3. Fuzzy target selection

4. Synthetic fuzzy driver

4.1. Fuzzy input allocation

4.2. Fuzzy loop structure

4.3. Initial seed extraction

4. Realization

5. Assessment

1. Automatically generate fuzzy drivers and their efficiency

3. Comparison with OSS-Fuzz driver

4. Evaluate UTOPIA’s design decisions

5. Experiment summary

6. Shortcomings

1. Other sources of false crashes

2. Limitations of UTOPIA analysis

8. Conclusion


 Summary

  • Fuzz testing is one of the most effective methods for detecting software security vulnerabilities. When fuzz testing a library, in order to improve testing efficiency, it is necessary to construct a high-quality fuzz driver (Fuzz Drivers). The driver should Have a suitable sequence of library API calls to explore the state of the library as much as possible.
  • This kind of fuzzy driver is usually written manually. In order to reduce the burden, existing methods try to start fromconsumer code Infer the valid sequence of the API, or directly extract the valid API sequence from the usage example to automatically generate a fuzzy driver. But the API sequences obtained in this way are related to the original application logic.
  • This article found thatUnit Test (UT) is carefully designed by developers to verify the correct use of APIs, and Writing unit tests is very common during development.
    • [Note] Unit testing (UT) is a testing method in software development that is used to verify whether the smallest testable unit in the software system functions as expected. These smallest units are usually independent modules such as functions, methods, or classes in software. The purpose of unit testing is to test these units independently to ensure that they produce the correct output given the input.
  • Therefore, this article proposes UTOPIA, an open source tool and analysis algorithm that can automatically generate fuzzy drivers from existing unit tests with almost zero human participation /span>, and its effectiveness has been proved through experiments.

1. Introduction

  • According to the test object, fuzz testing can be divided into two types:
    • End-to-end fuzzer: Test the entire program as a black box.
    • Library fuzzer: Test against a specific interface or API. (such as libFuzzer)
  • The difference between the two is thatFuzz testing the library requires building a fuzz driver for it, which contains the API call sequence.
  • To reduce the burden of building fuzzy drivers, there is research on generating high-quality fuzzy drivers by inferring API dependencies from consumer code. However, the fuzzy driver generated in this way will be limited by consumer code and may only contain simple, common API sequences. This is not ideal for fuzz testing looking for invalid, uncommon API sequences.
  • Unlike inferring API sequences, the exact order of API calls is used in unit testing. We also observed:
    • Existing unit tests make it clear which API dependencies developers care about.
    • Unit tests can detect more library APIs (such as internal APIs) than consumer code.
    • Many existing projects have well-written unit tests, as shown below.

2. Challenges and proposed methods

  • UTOPIA converts each existing unit test into a valid fuzz driver. It mainly solves the following two problems to reduce manual participation in the entire generation process.
    • Synthesize a valid sequence of API calls.
    • Synthesize valid API call parameters.
  • In a fuzzy driver, the library will crash not only because it encounters an error when running, but also because it uses invalid APIs caused by the two problems mentioned above. Crashes caused in this way are called spurious crashes and will invalidate fuzz testing.

1, Synthesize a valid API call sequence

  • A major challenge in generating fuzzy drivers is determining which APIs of a library to call and in what order to call them, since APIs often have strict sequential dependencies. For example, FileStorage() → writeRaw() → release().
  • If you just build a random sequence of API calls for your fuzz driver, you'll waste a lot of time. For example, calling writeRaw() after release() will cause a crash because the constructor is not called.
  • Limitations of using consumer code to infer API sequences:
    • If you want to obtain the usage pattern of the entire API from the consumer code, you first need to analyze the entire consumer code. If you encounter complex consumer code, which contains a large number of API calls distributed in complex control flows, there will be a problem that the extracted patterns are too bloated. The fuzzy driver generated in this way will contain a large number of API calls and require a large number of input parameters, which will affect the fuzzy efficiency.
    • There are also methods proposed to limit the amount of consumer code that generates fuzzy drivers, but the API sequence obtained in this way may be incomplete and may lead to spurious crashes.
  • proposed method
    • This article uses explicit API sequences written in unit tests to completely avoid the challenges of synthetic API sequences.
    • Unit testing (UT) has the following advantages:
      • In UT, explicitly build the state of the library for eachtest case, meaning that it is not required when generating the fuzz test driver Bear the burden of API schema inference or extraction.
      • The purpose of UT and fuzzy driver are consistent, and the designed test cases are targeted at variables or attributes that developers consider very important.
      • Since unit tests only contain the necessary API sequences for testing specific properties of the library, and generally not many, it is not easy to generate bloated API call sequences.
      • [Note] The test case here refers to the embodiment of the unit test, with specific input and output. After running the test case, you need to observe whether the output result is the same as the predicted output result.

2, Synthesize valid API call parameters

  • When inferring API call sequences, it is also necessary to understand the logic within and between APIs and reasonably assign fuzzy input values ​​based on their semantic relationships.
  • For example, if a parameter used for memory allocation or loop counting is fuzzed out to a larger value, an out-of-memory or timeout error will result. While these are not spurious crashes, they can impact the efficiency of fuzz testing.
  • Inter-API)There are mainly three relationships:
    • out-to-in: The output of one API serves as the input of another.
    • fixed: The same parameter should remain consistent in different API calls (such as API_1(x); API_2(x);).
    • relative: In different API calls, there is a certain derivative relationship between parameters (such as x=f(y); API_1(x); z=x+g(y); API_2(z);)
    • [Example] var a=3; → b=func(a); → Target_API(b); When assigning values ​​during fuzz testing, if you do not pay attention to the calling order between APIs, b may be assigned directly instead of a.
  • API internal (intra-API)There are mainly the following two relationships:
    • array ↔ length: One input parameter represents the length of another input parameter.
    • array ↔ index: One input parameter is the index of another input parameter.
    • [Example] In the figure below, the first parameter in the Mat class constructor is required to be consistent with the size of the array declared in the second and fourth parameters. If these parameters are randomly fuzzed, the fuzz driver will often cause a segfault (size parameter > the actual size of the array), or waste effort changing unused fuzz input bytes (size parameter < the actual size of the array).
  • proposed method
    • UTOPIA uses static analysis to find the location of the fuzzy input by retaining the original data flow in the test unit (the running state of the variable) a> and how they mutate. (position of API parameters)
    • In order to identify suitable places to inject fuzzy input, the concept of "root definition" is introduced, which is an assignment statement in which Variables are defined by constants, preserving the original data flow and existing API semantic relationships by assigning fuzzy inputs only on the root definition. To put it simply, the root definition is used to mark the parameter positions that need to be fuzzy, and then the root definition is used to indirectly pass in the fuzzy input.
    • In the figure below, UTOPIA passes the fuzzy input to the third parameter rawdata in writeRaw() (API) by assigning the fuzzy input to the root definition (line 23, fi8 and fi9 are fuzzy inputs, that is, the mutated parameters). (Line 31), where each element of the vector rawdata is assigned a constant value.
    • After locating the root definition, UTOPIA injects fuzzy input for the API parameters received from the root definition based on the analyzed variables. For example, in the constructor of the Mat class (line 18), UTOPIA infers the length of the array and assigns the size of dim (array) to the first parameter (array length) on line 18, letting each element All have fuzzy input (line 17).
    • [Note] The above figure is a simplified unit test based on FileStorage in the OpenCV test, which stores and reloads the matrix data by encoding it as XML. Based on this unit test, UTOPIA generates a fuzzed driver (differences are marked with -/+). The global variable fi{1-9} is the mutated fuzzy input for each run.

3. Challenges in utilizing unit testing

  • Analysis hindrance
    • Unit testing (UT) frameworks may be defined by a complex mix of class hierarchies and interfaces that are used to indirectly invoke user-defined test cases. Indirect calls through these interfaces can cause spurious crashes in fuzzed drivers generated by unit tests, so these issues need to be fixed manually before fuzz testing.
    • Although dynamic analysis can handle indirect calls, it suffers from over-approximation and difficulty in handling the associated semantics between parameter values, so it is not suitable for solving this type of problem.
  • UT framework diversity
    • Due to the differences in each UT framework, solving the problems inAnalysis hindrance may need to be based on the characteristics of each framework. Different treatments and repairs. If these problems need to be fixed one by one manually, it will consume a lot of time and manpower.
  • Assertion
    • Since assertions in UT are not only used to check criticality but also to verify that the results match specific test values ​​defined in the unit test, it is important to consider how assertions affect fuzz testing and handle them appropriately because Injecting ambiguous input into parameters may trigger assertion conditions.
    • If all assertions are ignored, nullptr checks on pointers will be more likely to cause spurious crashes due to dereferencing nullptr. However, if all assertions are enforced, the check of the test value will usually prevent the fuzz driver from continuing execution after the assertion statement.

3. Design

  • UTOPIA converts UT into a valid fuzzy driver by analyzing UT and target library code. The picture below is the overall workflow of UTOPIA.
    • UTOPIA takes advantage of the architectural features of the UT framework, so it only needs toanalyze the test functions implemented by developers without analyzing the entire UT framework. .
    • UTOPIAAnalysis and recognition API numerical attributes.
    • Perform UT to identify root definitions, injecting obfuscated input without affecting valid API usage semantics.
    • Generate fuzzy driver based on analysis results.

1. UT frame structure analysis

  • Generally speaking, the API provided by the UT framework allows users to define three functions for each test case: pre-test, test and post-test. Take GoogleTest (gtest, a kind of UT) as an example, as shown below, it exposes SetUp(), TestBody() and TearDown() interfaces to each test class (corresponding to pre-test, test and post-test respectively).
  • These functions implicitly ensure the following two things:
    • Each test case depends only on these functions.
    • Test cases are independent of each other.
  • UTOPIA leverages these features to explicitly call these functions in the fuzzing loop to build effective API sequences to ensure the independence of each fuzzing loop.
  • Clang AST Matchers
    • In addition, UTOPIA utilizesClang AST Matchers (a tool) to locate these functions. It uses clang AST Matchers to find functions with Abstract Syntax Tree (AST) patterns. For example, in the image above, UTOPIA looks for CXXRecordDecl in its child nodes and has the Test::Test class as CXXCtorInitializer. SetUp is then found by searching for a CXXMethodDecl named SetUp within the CXXRecordDecl found.
    • Other methods are similar. In order to support new UT frameworks, developers only need to specify the mode of the test function, thus reducing the workload of supporting different UT frameworks.

2. API attribute analysis

  • UTOPIA treats all exported functions of a library as public APIs and analyzes the parameters of each API to determine its properties. UTOPIA analyzes the program by leveraging a custom usage chain starting with API parameters to determine five properties: Output, FilePath, AllocSize, LoopCount, and Array↔Length (index).
    • Output: Indicates a parameter used to output certain values ​​to the caller of the API, similar to return.
    • FilePath: Indicates the parameter used as the file path in file operations.
    • AllocSize: Parameter indicating the specified allocation size.
    • LoopCount: Indicates that this parameter determines the counter of the loop in the library.
    • Array: Represents the parameters used as arrays in library code.
    • Length: Indicates the length of the array in the library code.
  • Def-Use (DU) chain
    • Attribute analysis focuses on the behavior of parameters within the library, and analyzes the usage of parameters along the definition-usage chain of the parameters. A chain connects a parameter's definition and all usage cases reachable from that definition to determine whether the parameter has a specific property.
  • Inter-procedural analysis
    • UTOPIA basically analyzes each function. If a use in a define-use chain points to the parameters of a subroutine call, UTOPIA first analyzes the called function and then merges the analysis results of the corresponding parameters of the called function. In cases where external function calls are involved, UTOPIA also supports loading pre-analysis results of other libraries to obtain more precise results about external functions.
  • The analysis process of UTOPIA is as follows:

3. Fuzzy target selection

  • In the determined target, UTOPIA can appropriately insert fuzzy input into the parameters of the calling library API (that is, the root definition of the parameter) . This is accomplished by looking up the root definition.
  • Root definition analysis
    • Root definition analysis is a reverse data flow analysis whose purpose is to obtain definitions whose rvalues ​​are constant values ​​that cannot be derived from other variables in the test code. Therefore, the root definition enables UTOPIA to inject fuzzy input without violating the semantics of the test code. And UTOPIA performs root definition analysis on all API parameters to collect every possible fuzzy target candidate.
  • As shown in the figure below, 'int A=10' is the only root definition recognized. Changes to rvalues ​​defined by the root affect the parameters of each API while maintaining the relationship between APIs. In order to identify all definitions that may affect API parameters, the analysis is control flow sensitive and cross-process to find all definitions that may affect API parameters.
  • Inheritance of parameter attributes
    • In order to determine the mutation strategy, UTOPIA must pair the root definition with the attributes of the corresponding parameters. This is accomplished by assigning the parameters' properties to the root definition that uses those parameters directly.
    • For example, in the image above, the root definition 'int A = 10' has properties for the first parameter of API_1 and API_2. However, the properties of the first parameter of API_4 are not inherited because the root definition is not used directly for that parameter. During root definition analysis, change the trace target from C to B via 'int C = API_3(B)'.
  • Inference of external functions
    • If the trace target is defined by an external function, UTOPIA traces all input parameters to find all possible definitions.

4. Synthetic fuzzy driver

4.1. Fuzzy input allocation
  • UTOPIA converts each test case into a fuzzy driver by replacing the identified fuzzy targets with fuzzy input assignment statements. Among identified fuzzy targets, UTOPIA excludes certain root definitions if their source code cannot be modified or an appropriate method of generating fuzzy input cannot be determined. Exclusion criteria are as follows:
    • A root definition in a header file or project file.
    • Constants determined at compile time, such as sizeof(int).
    • Assignment takes a return or output parameter (not an input parameter) of an external function.
    • The root definition is assigned with nullptr because it is not known how to initialize the object referenced by the pointer.
    • Function pointer parameters.
    • Values ​​that rely on ignored values, such as ArrayLen that ignores Array.
    • File properties.
  • After exclusion,UTOPIA replaces the rvalue of the assignment statement with fuzzy input according to the data type and mutation strategy of the assignment statement.
  • UTOPIA follows the following mutation strategy to comply with API semantics:
    • FilePath: Pass obfuscated input to file content instead of file path.
    • AllocSize: Limits the fuzzy input range used as a parameter for memory allocation size.​ 
    • LoopCount: Limits the fuzzy input range of parameters used as loop exit conditions.
    • Array: Treat the fuzzy input as an array, i.e. create an array and assign the fuzzy input to each element of the array.
    • ArrayLength/Index: Limits fuzzy input to the size of the created array minus one.
4.2. Fuzzy loop structure
  • UTOPIA builds an entry function that is called once in each fuzz testing cycle. The entry function receives fuzzy input from the fuzz testing engine (such as libfuzzer), and sequentially identifies and calls the test functions (such as SetUp(), TestBody() and TearDown() in gtest) to execute the fuzz driver with the specified fuzz input program.
4.3. Initial seed extraction
  • UTOPIA obtained the initial seed corpus embedded in the test code during UT analysis, i.e. The constant value identified as the fuzzy target in the root definition. These initial seeds allow the fuzz driver to reach deep program states in the early stages of fuzz testing and help the fuzzer explore deep paths.

4. Realization

  • This study used 39,000 lines of code to implement UTOPIA. Among them, 37,000 lines of code are C/C++ codes used to analyze libraries and unit tests and generate fuzzy drivers, while the remaining 2,000 lines of code are Python scripts to support and simplify the entire analysis and generation process.
  • The code for analysis and fuzz driver synthesis leverages LLVM/Clang’s analysis framework, and the conversion of unit test code into fuzz drivers is achieved through Clang AST Matcher and Libtooling.
    • [Note] LLVM is a general compiler infrastructure project, and Clang is the compiler front-end for C/C++/Objective-C in the LLVM project. The combination of the two can provide a powerful compiler tool chain and performance optimization capabilities.
    • Clang AST Matcher provides functionality for pattern matching and searching within the Clang AST, while LibTooling provides developers with a framework and API for creating custom Clang-based compiler tools.

5. Assessment

  • This article evaluates UTOPIA from the following aspects:
    • Automation
      • How many unit test based projects can be automatically converted by UTOPIA? How effective is the resulting fuzzy driver?
    • Fuzzing effectiveness.
      • How do UTOPIA-generated fuzzy drivers compare to hand-written fuzzy drivers in terms of code coverage and presence of bugs?
    • Comparison
      • How does UTOPIA compare to existing methods of automatically generating fuzzy drivers?
    • Design decisions
      • How many false crashes does UTOPIA reduce, what is the best strategy for handling assertions, and how effective is analyzing API properties?

1. Automatically generate fuzzy drivers and their efficiency

  • As shown in the figure below, among the 5523 test cases in the project, the author excluded 1039 test cases that were implemented using macro functions that were not handled in the prototype implementation, that is, except TEST and TEST F (for gtest) or BOOST AUTO TEST CASE Test cases outside of FIXTURE (for boost) (Oths).
  • For the remaining 4,484 test cases, UTOPIA removed 1,769 test cases (39% of the 4,484 test cases examined) during the process of determining the root definition based on the article's exclusion criteria.
  • In total, UTOPIA automatically generated 2715 fuzzy drivers from viable candidate TCs in these projects.
  • A total of 123 bugs were found, 109 of which were discovered in a short run of 2715 fuzzed drivers generated in 25 OSS projects, of which 56 were confirmed or fixed by the maintainers; 14 were found in 30 Tizen Found in about two weeks, some of the 2,411 fuzz drivers generated by the native library, some of which had been dormant for as long as seven years, were confirmed by Tizen. UTOPIA used the exact same API sequence in the test cases but still found new bugs, demonstrating that leveraging TCs can uncover new types of bugs that developers missed during testing. Used UTOPIA to generate fuzzy driver source code for 30 Tizen projects, which was adopted by the community.

3. Comparison with OSS-Fuzz driver

  • Comparison of the coverage of the manually written fuzzy driver applied on OSS-FUZZ and the automatically generated fuzzy driver UTOPIA:
  • As shown in the two figures above, UTOPIA's fuzzy driver performs 20.5% better on average in 4 out of 6 projects, and performs poorly (9.7% on average) in 2 projects, but unique coverage exists in all of them.

4. Evaluate UTOPIA’s design decisions

  • The impact of different handling of assertions on fuzz testing
    • As shown below. You can see that ignoring assertions can have a detrimental effect on fuzz testing.
  • Impact of profiling properties ArrayLength, AllocSize and LoopCount obtained through library profiling on reducing spurious crashes and crashes caused by harmful ambiguous inputs
    • For evaluation, we selected fuzzy drivers from three projects that tested API parameters with three properties by removing one of the properties for comparison. As shown in the figure below, no setting of the ArrayLength or AllocSize properties results in a dramatic increase in crashes, up to two orders of magnitude, with a slight increase in coverage. On the other hand, without the LoopCount attribute, no difference will be observed in crashes, but the exec/sec performance will drop significantly, up to 40%. For the assimp project, when the AllocSize attribute is removed, coverage and exec/sec are reduced to 37% and 2% respectively compared to the include attribute. In the case of the libtp project, without the ArrayLength property, coverage and exec/sec performance were poor and crashes increased by 645 times. Furthermore, the omission of LoopCount in leveldb reduces exec/sec performance to 41%.

5. Experiment summary

  • UTOPIA can effectively synthesize fuzzy drivers from existing unit tests with almost zero manual involvement.
  • Successfully applied UTOPIA to 55 open source project libraries, including Tizen and Node.js, and automatically generated 5K fuzzed drivers from 8K qualified unit tests.
  • The generated fuzzers were executed approximately 5 million times per core hour, and 123 bugs were discovered.
  • The fuzzy driver generated by 2.4K is applied to the continuous integration process of Tizen, which shows the effectiveness of the fuzzy driver generated by UTOPIT.

6. Shortcomings

1. Other sources of false crashes

  • unconventional relationship
    • Fuzzy drivers cannot be generated for some unconventional and highly customized parameter and relationship usages.
  • Insufficient error handling
    • Developers will skip checking the correct construction and allocation of objects and hard-code some unnecessary parameters when performing unit tests. This kind of UT may lead to false errors when it becomes a fuzz test driver.

2. Limitations of UTOPIA analysis

  • Root definition of file path
    • In some test cases, file path strings are created through multiple string operations. In this case, if UTOPIA creates a file for fuzz testing and assigns its path at the root definition of the string (before all operations), the actual path to the API access will be incorrect, to avoid this In this case, UTOPIA heuristically assigns the generated ambiguous file path to the closest string operation before the API. However, due to this heuristic, UTOPIA may not reflect the original UT logic in the generated fuzzy driver.
  • Constant value aliases in logic
    • UTOPI may have difficulty generating suitable fuzz drivers when test cases directly use constant values.

8. Conclusion

  • In this paper, we propose UTOPIA, which can automatically generate fuzz drivers from available unit tests with no or little human intervention. It not only understands the semantic structure of the unit testing framework, but also analyzes the implementation of the API of each library being tested. Therefore, UTOPIA is able to generate numerous fuzzy drivers with valid API call sequences in a scalable manner.
  • UTOPIA successfully generated fuzzy drivers for 55 popular open source projects, proving that UTOPIA can be widely used.
  • Evaluation shows that compared to handcrafted fuzzers, UTOPIA can achieve higher code coverage (20.4% on average) in 4 out of 6 projects while providing developers with interesting APIs. More importantly, UTOPIA found 123 new bugs in 55 open source projects.

Guess you like

Origin blog.csdn.net/weixin_45100742/article/details/134840509