Introduction to clang 10-sanitizerCoverage

1.Introduction

llvm has a simple code coverage detection (sanitizercoverage) built in. It inserts calls to user-defined functions at the function level, basic block level, and edge level. The default implementation of these callbacks is provided, and simple coverage reporting and visualization are implemented, but if you only need coverage visualization, you may need to use sourcebasedcodecoverage instead.

2.Tracing PCs with guards

Using -fsanitize coverage = trace pc guard, the compiler will insert the following code on each edge:

__sanitizer_cov_trace_pc_guard(&guard_variable)

Each side has its own protected variable (uint32).

The completion program will also insert a call to the module constructor:

// The guards are [start, stop).警卫在[start,stop)。
// This function will be called at least once per DSO and may be called.每个dso至少调用一次此函数,可以调用
// more than once with the same values of start/stop.多次使用相同的“开始/停止”值。
__sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop);

An additional ...=trace-pc,indirect-callsflag __sanitizer_cov_trace_pc_indirect (void * callee) is inserted in each indirect call  .

The function __sanitizer_cov_trace_pc_ * should be defined by the user.

E.g:

#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>

// This callback is inserted by the compiler as a module constructor
// into every DSO. 'start' and 'stop' correspond to the
// beginning and end of the section with the guards for the entire
// binary (executable or DSO). The callback will be called at least
// once per DSO and may be called multiple times with the same parameters.
//编译器将此回调作为模块构造函数插入到每个dso中。“开始”和“停止”对应于节的开头和结尾,并带有整个二进制文件(可执行文件或DSO)的保护。每个dso至少调用一次回调,并且可以使用相同的参数多次调用。
extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *start,
                                                    uint32_t *stop) {
  static uint64_t N;  // Counter for the guards.
  if (start == stop || *start) return;  // Initialize only once.初始化一次
  printf("INIT: %p %p\n", start, stop);
  for (uint32_t *x = start; x < stop; x++)
    *x = ++N;  // Guards should start from 1.
}

// This callback is inserted by the compiler on every edge in the
// control flow (some optimizations apply).
// Typically, the compiler will emit the code like this:
//    if(*guard)
//      __sanitizer_cov_trace_pc_guard(guard);
// But for large functions it will emit a simple call:
//    __sanitizer_cov_trace_pc_guard(guard);
//此回调由编译器在控制流的每一条边上插入(应用某些优化)。通常,编译器会发出如下代码:
//if(*guard)
//  __sanitizer_cov_trace_pc_guard(guard);
//但对于大型函数,它将发出一个简单的调用:
//  __sanitizer_cov_trace_pc_guard(guard);
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
  if (!*guard) return;  // Duplicate the guard check.重复警卫检查。
  // If you set *guard to 0 this code will not be called again for this edge.
  // Now you can get the PC and do whatever you want:
  //   store it somewhere or symbolize it and print right away.
  // The values of `*guard` are as you set them in
  // __sanitizer_cov_trace_pc_guard_init and so you can make them consecutive
  // and use them to dereference an array or a bit vector.
  //如果将*guard设置为0,则不会为此边缘再次调用此代码。
  //现在你可以得到PC,做任何你想做的事:把它储存在某处或象征它,并立即打印。
  //`*guard`的值与您在__sanitizer_cov_trace_pc_guard_init中设置的值相同,因此您可以使它们连续,并使用它们取消对数组或位向量的引用。
  void *PC = __builtin_return_address(0);
  char PcDescr[1024];
  // This function is a part of the sanitizer run-time.
  // To use it, link with AddressSanitizer or other sanitizer.
  //此函数是消毒剂运行时的一部分。
  //要使用它,请链接AddressSanitizer或其他sanitizer。
  __sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr));
  printf("guard: %p %x PC %s\n", guard, *guard, PcDescr);
}
// trace-pc-guard-example.cc
int sub() {
	int d=9-5;
	return d;}
int foo() {
	int c=sub()+5;
	return c;}
int main() {
	int f=foo();
	return 0;
}
clang++ -g  -fsanitize-coverage=trace-pc-guard trace-pc-guard-example.cc -c
clang++ trace-pc-guard-cb.cc trace-pc-guard-example.o -fsanitize=address
ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out
ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out
INIT: 0x530c50 0x530c5c
guard: 0x530c58 3 PC 0x4f86e6 in main trace-pc-guard-example.cc:7
guard: 0x530c54 2 PC 0x4f86b6 in foo() trace-pc-guard-example.cc:4
guard: 0x530c50 1 PC 0x4f8686 in sub() trace-pc-guard-example.cc:1

3.Inline 8bit-counters

Experimental, may change or disappear in the future

If -fsanitize-coverage = inline-8bit-counters, the compiler will insert inline counter increments at each edge. This is similar to -fsanitize-coverage = trace-pc-guard, but detection only increments a counter, not a callback.

The user needs to implement a function to capture the counter at startup.

extern "C"
void __sanitizer_cov_8bit_counters_init(char *start, char *end) {
  // [start,end) is the array of 8-bit counters created for the current DSO.
  // Capture this array in order to read/modify the counters.
//[start,end)是为当前DSO创建的8位计数器数组。捕获此数组以读取/修改计数器。
}

4.PC-Table

Experimental, may change or disappear in the future

Note: For linkers other than lld, this detection may not be compatible with dead code stripping (-wl, -gc sections), resulting in significant binary size overhead. For more information, see Bug 34636.

Using -fsanitize-coverage = pc-table, the compiler will create a table of detected pcs. Requires -fsanitize-coverage = inline-8bit-counters or -fsanitize-coverage = trace-pc-guard.

The user needs to implement a function to capture the PC table at startup:

extern "C"
void __sanitizer_cov_pcs_init(const uintptr_t *pcs_beg,
                              const uintptr_t *pcs_end) {
  // [pcs_beg,pcs_end) is the array of ptr-sized integers representing
  // pairs [PC,PCFlags] for every instrumented block in the current DSO.
  // Capture this array in order to read the PCs and their Flags.
  // The number of PCs and PCFlags for a given DSO is the same as the number
  // of 8-bit counters (-fsanitize-coverage=inline-8bit-counters) or
  // trace_pc_guard callbacks (-fsanitize-coverage=trace-pc-guard)
  // A PCFlags describes the basic block:
  //  * bit0: 1 if the block is the function entry block, 0 otherwise.
  //[pcs-beg,pcs-end)是当前dso中每个检测块的ptr大小的整数数组,表示对[PC,PCFlags]。
  //捕获此阵列以读取PC及其标志。
  //给定dso的pc和pcflags的数量与8位计数器的数量相同(-fsanitize-coverage=inline-8bit-counters)或trace-pc-guard回调(-fsanitize-coverage=trace-pc-guard)
  //PCFlags描述基本块:
  //*bit0:1如果块是函数输入块,则为0。
}

For example, we can use some of the above functions to complete the collection of program runtime information (that is, how to complete the calculation of program coverage)

//foo.cc
#include<iostream>
#include<string>
int add(int i,int j)
{
	return i+j;
}
int main()
{
	std::string s;
	std::string s1="abcdefghijik";
	int i;
	std::cin>>s;
	if(s==s1){
		i=add(3,5);
	}
	else{
		std::cout<<"wrong"<<std::endl;
	}
	return 0;
}

 

#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>
#include <assert.h>
#include <vector>
#define ATTRIBUTE_INTERFACE __declspec(dllexport)
#define ATTRIBUTE_INTERFACE __attribute__((visibility("default")))
struct Module {
	uint32_t *Start, *Stop;
};

static const size_t kNumPCs = 1 << 21;
uint8_t __sancov_trace_pc_guard_8bit_counters[kNumPCs];
uintptr_t __sancov_trace_pc_pcs[kNumPCs];
Module Modules[4096];
size_t NumModules=0;  // linker-initialized.
size_t NumGuards=0;  // linker-initialized.
uint8_t *Counterss() {
	return __sancov_trace_pc_guard_8bit_counters;
}
uintptr_t *PCs(){
	return __sancov_trace_pc_pcs;
}
size_t GetNumPCs() { return kNumPCs<NumGuards + 1?kNumPCs:NumGuards + 1; }
//std::vector<uintptr_t> PCsCopy(GetNumPCs());
uintptr_t *PCs();
uintptr_t GetPC(size_t Idx) {
	assert(Idx < GetNumPCs());
	return PCs()[Idx];
}
size_t GetTotalPCCoverage() {
	size_t Res = 0;
	for (size_t i = 1, N = GetNumPCs(); i < N; i++)
		if (PCs()[i])
      Res++;
  return Res;
}
//ATTRIBUTE_INTERFACE
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *Guard) {
	uintptr_t PC = reinterpret_cast<uintptr_t>(__builtin_return_address(0));
	uint32_t Idx = *Guard;
	__sancov_trace_pc_pcs[Idx] = PC;
	__sancov_trace_pc_guard_8bit_counters[Idx]++;
	//size_t NumFeatures = CollectFeatures([&](size_t Feature) -> bool {return Feature%3;});
	printf("GetTotalPCCoverage() is %zu\n",GetTotalPCCoverage());
	//GetNumPCs
}
extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *Start, uint32_t *Stop) {
	if (Start == Stop || *Start) return;
		assert(NumModules < sizeof(Modules) / sizeof(Modules[0]));
	for (uint32_t *P = Start; P < Stop; P++) {
		NumGuards++;
		if (NumGuards == kNumPCs) {
			printf(
			"WARNING: The binary has too many instrumented PCs.\n"
			"         You may want to reduce the size of the binary\n"
			"         for more efficient fuzzing and precise coverage data\n");}
		*P = NumGuards % kNumPCs;
	}
	Modules[NumModules].Start = Start;
	Modules[NumModules].Stop = Stop;
	NumModules++;
}

The results are as follows:

# clang++ -g  -fsanitize-coverage=trace-pc-guard,inline-8bit-counters,pc-table,trace-cmp,func foo.cc -c
# clang++ san.cc foo.o -fsanitize=address -o a
# ./a
GetTotalPCCoverage() is 1
GetTotalPCCoverage() is 2
GetTotalPCCoverage() is 3
aaaaaaaaaaaaaaaaa
GetTotalPCCoverage() is 4
wrong

 

5.Tracing PCs

When -fsanitize-coverage = trace-pc, the compiler will insert __sanitizer_cov_trace_pc () on each side. An additional ...=trace-pc,indirect-callsflag __sanitizer_cov_trace_pc_indirect (void * callee) is inserted in each indirect call  . These callbacks are not implemented when the Sanitizer is running, and should be defined by the user. This mechanism is used to obfuscate the Linux kernel (https://github.com/google/syzkaller).

6.Instrumentation points

  • Edge (default): The edge is detected (see below).
  • BB: The basic block is detected.
  • Function: Only the entry block of each function is detected.

These flag with trace-pc-guard or trace-pc, as follows:  -fsanitize-coverage=func,trace-pc-guard.

When using edge or bb, if this detection is considered redundant, then some edges / blocks may still not be detected (trimmed). Use no pruning (eg -fsanitize coverage = bb, no-prune, trace-pc-guard) to disable pruning. This may contribute to better coverage visualization.

7.Edge coverage

Consider the following code

void foo(int *a) {
  if (a)
    *a = 0;
}

It contains 3 basic blocks, we named them a, b, c:

A
|\
| \
|  B
| /
|/
C

If blocks a, b, and c are all covered, we are sure that edges a => b and b => c have all been executed, but we still do not know whether edges a => c have been executed. The edges of this control flow graph are called critical edges. Edge-level coverage simply splits all critical edges by introducing new virtual blocks and then inserts these blocks:

A
|\
| \
D  B
| /
|/
C

8.Tracing data flow

Support data flow guided fuzz. With -fsanitize-coverage = trace-cmp, the compiler will insert additional detection around the comparison instructions and switch statements. Similarly, using -fsanitize-coverage = trace-div the compiler will insert an integer division instruction (to capture the correct parameters for division), and use the  -fsanitize-coverage=trace-gep–llvm gep instruction (to capture the array index).

Unless no-prune option is provided, some comparison instructions will not be detected.

// Called before a comparison instruction.
// Arg1 and Arg2 are arguments of the comparison.
//在比较指令之前调用。
//arg1和arg2是比较的参数。
void __sanitizer_cov_trace_cmp1(uint8_t Arg1, uint8_t Arg2);
void __sanitizer_cov_trace_cmp2(uint16_t Arg1, uint16_t Arg2);
void __sanitizer_cov_trace_cmp4(uint32_t Arg1, uint32_t Arg2);
void __sanitizer_cov_trace_cmp8(uint64_t Arg1, uint64_t Arg2);

// Called before a comparison instruction if exactly one of the arguments is constant.
// Arg1 and Arg2 are arguments of the comparison, Arg1 is a compile-time constant.
// These callbacks are emitted by -fsanitize-coverage=trace-cmp since 2017-08-11
//如果恰好有一个参数是常量,则在比较指令之前调用。
//arg1和arg2是比较的参数,arg1是编译时常量。
//这些回调是由-fsanitize-coverage=trace-cmp从2017-08-11发出的
void __sanitizer_cov_trace_const_cmp1(uint8_t Arg1, uint8_t Arg2);
void __sanitizer_cov_trace_const_cmp2(uint16_t Arg1, uint16_t Arg2);
void __sanitizer_cov_trace_const_cmp4(uint32_t Arg1, uint32_t Arg2);
void __sanitizer_cov_trace_const_cmp8(uint64_t Arg1, uint64_t Arg2);

// Called before a switch statement.
// Val is the switch operand.
// Cases[0] is the number of case constants.
// Cases[1] is the size of Val in bits.
// Cases[2:] are the case constants.
//在switch语句之前调用。
//val是开关操作数。
//cases[0]是case常量的数目。
//cases[1]是以位为单位的val的大小。
//cases[2:]是case常量。
void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);

// Called before a division statement.
// Val is the second argument of division.
//在division语句之前调用。
//val是除法的第二个参数。
void __sanitizer_cov_trace_div4(uint32_t Val);
void __sanitizer_cov_trace_div8(uint64_t Val);

// Called before a GetElemementPtr (GEP) instruction
// for every non-constant array index.
//在getelemementptr(gep)指令之前调用
//对于每个非常量数组索引。
void __sanitizer_cov_trace_gep(uintptr_t Idx);

for example 

//foo.cc
#include<iostream>
#include<string>
int add(int i,int j)
{
	return i+j;
}
int main()
{
	std::string s;
	int i;
	std::cin>>s;
	if(s[0]=='w'){
		i=add(3,5);
	}
	else{
		std::cout<<"wrong"<<std::endl;
	}
	return 0;
}
#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>
extern "C" void __sanitizer_cov_trace_const_cmp4(uint32_t Arg1, uint32_t Arg2)
{
	uintptr_t PC = reinterpret_cast<uintptr_t>(__builtin_return_address(0));
	printf("cmp4PC is %lu,Arg1 is %u,Arg2 is %u\n",PC,Arg1,Arg2);
}

 The results are as follows:
 

# clang++ -g  -fsanitize-coverage=trace-pc-guard,inline-8bit-counters,pc-table,trace-cmp foo.cc -c
# clang++ san.cc foo.o -fsanitize=address
# ./a.out 
qqqqqqqqqqqqqqq
cmp4PC is 5211447,Arg1 is 119,Arg2 is 113
wrong

9.Default implementation

The disinfectant runtime (addresssanitizer, memorysanizer, etc.) provides some default implementations of coverage callbacks. You can use this to dump coverage to disk at process exit.

example:

//cov.cc
#include<stdio.h>
__attribute__((noinline))
void foo(){printf("foo\n");}
int main(int argc,char **argv)
{
	if(argc==2)
	{
		foo();
	}
	printf("main\n");
}
% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=trace-pc-guard
% ASAN_OPTIONS=coverage=1 ./a.out; wc -c *.sancov
main
SanitizerCoverage: ./a.out.7312.sancov 2 PCs written
24 a.out.7312.sancov
% ASAN_OPTIONS=coverage=1 ./a.out foo ; wc -c *.sancov
foo
main
SanitizerCoverage: ./a.out.7316.sancov 3 PCs written
24 a.out.7312.sancov
32 a.out.7316.sancov

Each time an executable file detected using sanitizercoverage is run, a * .sancov file is created during the process shutdown. If the executable file is dynamically linked with the inserted DSO, a * .sancov file will also be created for each DSO.

10.Sancov data format

* The format of the .sancov file is very simple: the first 8 bytes are magic, one of 0xc0bffffffffffff64 and 0xc0bffffffffffffffffff32. The last byte of the magic defines the size of the following offset. The rest of the data is the offset in the corresponding binary / dso executed during operation.

11.Sancov Tool

Provides a simple sancov tool to process coverage files. This tool is part of the llvm project and is currently only supported on Linux. It can handle symbolization tasks autonomously without any additional support from the environment. You need to pass the path of the .sancov file (named <module \ u name> <pid> .sancov) and all corresponding binary elf files. Sancov uses module names and binary file names to match these files.

12.Coverage Reports

experiment

The .sancov file contains insufficient information to generate a source-level coverage report. The missing information is included in the debugging information of the binary file. Therefore, .sancov must be symbolized before the .symcov file can be generated:

sancov -symbolize my_program.123.sancov my_program > my_program.123.symcov

By running the tools / sancov / coverage-report-server.py script that will start the http server, you can overwrite the .symcov file on the source code.

13.Output directory

By default, the .sancov file is created in the current working directory. This can be changed via ASAN_OPTIONS = coverage_dir = / path:

% ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
% ls -l /tmp/cov/*sancov
-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov

 

Published 43 original articles · Like 23 · Visits 30,000+

Guess you like

Origin blog.csdn.net/zhang14916/article/details/100924489