clang 10 介绍——sanitizerCoverage

1.Introduction

llvm内置了一个简单的代码覆盖率检测(sanitizercoverage)。它在函数级、基本块级和边缘级插入对用户定义函数的调用。提供了这些回调的默认实现,并实现了简单的覆盖率报告和可视化,但是,如果您只需要覆盖率可视化,则可能需要改用sourcebasedcodecoverage。

2.Tracing PCs with guards

使用-fsanitize coverage=trace pc guard,编译器将在每个边缘插入以下代码:

__sanitizer_cov_trace_pc_guard(&guard_variable)

每个边都有自己的保护变量(uint32)。

完成程序还将插入对模块构造函数的调用:

// The guards are [start, stop).警卫在[start,stop)。
// This function will be called at least once per DSO and may be called.每个dso至少调用一次此函数,可以调用
// more than once with the same values of start/stop.多次使用相同的“开始/停止”值。
__sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop);

在每个间接调用中都会插入一个附加的 ...=trace-pc,indirect-calls标志__sanitizer_cov_trace_pc_indirect(void *callee)。

函数__sanitizer_cov_trace_pc_*应由用户定义。

例如:

#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>

// This callback is inserted by the compiler as a module constructor
// into every DSO. 'start' and 'stop' correspond to the
// beginning and end of the section with the guards for the entire
// binary (executable or DSO). The callback will be called at least
// once per DSO and may be called multiple times with the same parameters.
//编译器将此回调作为模块构造函数插入到每个dso中。“开始”和“停止”对应于节的开头和结尾,并带有整个二进制文件(可执行文件或DSO)的保护。每个dso至少调用一次回调,并且可以使用相同的参数多次调用。
extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *start,
                                                    uint32_t *stop) {
  static uint64_t N;  // Counter for the guards.
  if (start == stop || *start) return;  // Initialize only once.初始化一次
  printf("INIT: %p %p\n", start, stop);
  for (uint32_t *x = start; x < stop; x++)
    *x = ++N;  // Guards should start from 1.
}

// This callback is inserted by the compiler on every edge in the
// control flow (some optimizations apply).
// Typically, the compiler will emit the code like this:
//    if(*guard)
//      __sanitizer_cov_trace_pc_guard(guard);
// But for large functions it will emit a simple call:
//    __sanitizer_cov_trace_pc_guard(guard);
//此回调由编译器在控制流的每一条边上插入(应用某些优化)。通常,编译器会发出如下代码:
//if(*guard)
//  __sanitizer_cov_trace_pc_guard(guard);
//但对于大型函数,它将发出一个简单的调用:
//  __sanitizer_cov_trace_pc_guard(guard);
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
  if (!*guard) return;  // Duplicate the guard check.重复警卫检查。
  // If you set *guard to 0 this code will not be called again for this edge.
  // Now you can get the PC and do whatever you want:
  //   store it somewhere or symbolize it and print right away.
  // The values of `*guard` are as you set them in
  // __sanitizer_cov_trace_pc_guard_init and so you can make them consecutive
  // and use them to dereference an array or a bit vector.
  //如果将*guard设置为0,则不会为此边缘再次调用此代码。
  //现在你可以得到PC,做任何你想做的事:把它储存在某处或象征它,并立即打印。
  //`*guard`的值与您在__sanitizer_cov_trace_pc_guard_init中设置的值相同,因此您可以使它们连续,并使用它们取消对数组或位向量的引用。
  void *PC = __builtin_return_address(0);
  char PcDescr[1024];
  // This function is a part of the sanitizer run-time.
  // To use it, link with AddressSanitizer or other sanitizer.
  //此函数是消毒剂运行时的一部分。
  //要使用它,请链接AddressSanitizer或其他sanitizer。
  __sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr));
  printf("guard: %p %x PC %s\n", guard, *guard, PcDescr);
}
// trace-pc-guard-example.cc
int sub() {
	int d=9-5;
	return d;}
int foo() {
	int c=sub()+5;
	return c;}
int main() {
	int f=foo();
	return 0;
}
clang++ -g  -fsanitize-coverage=trace-pc-guard trace-pc-guard-example.cc -c
clang++ trace-pc-guard-cb.cc trace-pc-guard-example.o -fsanitize=address
ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out
ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out
INIT: 0x530c50 0x530c5c
guard: 0x530c58 3 PC 0x4f86e6 in main trace-pc-guard-example.cc:7
guard: 0x530c54 2 PC 0x4f86b6 in foo() trace-pc-guard-example.cc:4
guard: 0x530c50 1 PC 0x4f8686 in sub() trace-pc-guard-example.cc:1

3.Inline 8bit-counters

实验性的,将来可能改变或消失

如果-fsanitize-coverage=inline-8bit-counters,编译器将在每个边缘插入内联计数器增量。这类似于-fsanitize-coverage=trace-pc-guard,但检测只是增加一个计数器,而不是回调。

用户需要实现一个函数来捕获启动时的计数器。

extern "C"
void __sanitizer_cov_8bit_counters_init(char *start, char *end) {
  // [start,end) is the array of 8-bit counters created for the current DSO.
  // Capture this array in order to read/modify the counters.
//[start,end)是为当前DSO创建的8位计数器数组。捕获此数组以读取/修改计数器。
}

4.PC-Table

实验性的,将来可能改变或消失

扫描二维码关注公众号,回复: 11075426 查看本文章

注意:对于lld以外的链接器,此检测可能与死代码剥离(-wl,-gc段)不兼容,从而导致显著的二进制大小开销。有关更多信息,请参阅Bug 34636。

使用-fsanitize-coverage=pc-table,编译器将创建一个检测的pc的表。需要-fsanitize-coverage=inline-8bit-counters或-fsanitize-coverage=trace-pc-guard。

用户需要实现一个函数来在启动时捕获PC表:

extern "C"
void __sanitizer_cov_pcs_init(const uintptr_t *pcs_beg,
                              const uintptr_t *pcs_end) {
  // [pcs_beg,pcs_end) is the array of ptr-sized integers representing
  // pairs [PC,PCFlags] for every instrumented block in the current DSO.
  // Capture this array in order to read the PCs and their Flags.
  // The number of PCs and PCFlags for a given DSO is the same as the number
  // of 8-bit counters (-fsanitize-coverage=inline-8bit-counters) or
  // trace_pc_guard callbacks (-fsanitize-coverage=trace-pc-guard)
  // A PCFlags describes the basic block:
  //  * bit0: 1 if the block is the function entry block, 0 otherwise.
  //[pcs-beg,pcs-end)是当前dso中每个检测块的ptr大小的整数数组,表示对[PC,PCFlags]。
  //捕获此阵列以读取PC及其标志。
  //给定dso的pc和pcflags的数量与8位计数器的数量相同(-fsanitize-coverage=inline-8bit-counters)或trace-pc-guard回调(-fsanitize-coverage=trace-pc-guard)
  //PCFlags描述基本块:
  //*bit0:1如果块是函数输入块,则为0。
}

举个例子,我们可以借助上面的一些函数完成对程序运行时信息收集(即如何完成程序覆盖率的计算)

//foo.cc
#include<iostream>
#include<string>
int add(int i,int j)
{
	return i+j;
}
int main()
{
	std::string s;
	std::string s1="abcdefghijik";
	int i;
	std::cin>>s;
	if(s==s1){
		i=add(3,5);
	}
	else{
		std::cout<<"wrong"<<std::endl;
	}
	return 0;
}
#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>
#include <assert.h>
#include <vector>
#define ATTRIBUTE_INTERFACE __declspec(dllexport)
#define ATTRIBUTE_INTERFACE __attribute__((visibility("default")))
struct Module {
	uint32_t *Start, *Stop;
};

static const size_t kNumPCs = 1 << 21;
uint8_t __sancov_trace_pc_guard_8bit_counters[kNumPCs];
uintptr_t __sancov_trace_pc_pcs[kNumPCs];
Module Modules[4096];
size_t NumModules=0;  // linker-initialized.
size_t NumGuards=0;  // linker-initialized.
uint8_t *Counterss() {
	return __sancov_trace_pc_guard_8bit_counters;
}
uintptr_t *PCs(){
	return __sancov_trace_pc_pcs;
}
size_t GetNumPCs() { return kNumPCs<NumGuards + 1?kNumPCs:NumGuards + 1; }
//std::vector<uintptr_t> PCsCopy(GetNumPCs());
uintptr_t *PCs();
uintptr_t GetPC(size_t Idx) {
	assert(Idx < GetNumPCs());
	return PCs()[Idx];
}
size_t GetTotalPCCoverage() {
	size_t Res = 0;
	for (size_t i = 1, N = GetNumPCs(); i < N; i++)
		if (PCs()[i])
      Res++;
  return Res;
}
//ATTRIBUTE_INTERFACE
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *Guard) {
	uintptr_t PC = reinterpret_cast<uintptr_t>(__builtin_return_address(0));
	uint32_t Idx = *Guard;
	__sancov_trace_pc_pcs[Idx] = PC;
	__sancov_trace_pc_guard_8bit_counters[Idx]++;
	//size_t NumFeatures = CollectFeatures([&](size_t Feature) -> bool {return Feature%3;});
	printf("GetTotalPCCoverage() is %zu\n",GetTotalPCCoverage());
	//GetNumPCs
}
extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *Start, uint32_t *Stop) {
	if (Start == Stop || *Start) return;
		assert(NumModules < sizeof(Modules) / sizeof(Modules[0]));
	for (uint32_t *P = Start; P < Stop; P++) {
		NumGuards++;
		if (NumGuards == kNumPCs) {
			printf(
			"WARNING: The binary has too many instrumented PCs.\n"
			"         You may want to reduce the size of the binary\n"
			"         for more efficient fuzzing and precise coverage data\n");}
		*P = NumGuards % kNumPCs;
	}
	Modules[NumModules].Start = Start;
	Modules[NumModules].Stop = Stop;
	NumModules++;
}

运行结果如下所示:

# clang++ -g  -fsanitize-coverage=trace-pc-guard,inline-8bit-counters,pc-table,trace-cmp,func foo.cc -c
# clang++ san.cc foo.o -fsanitize=address -o a
# ./a
GetTotalPCCoverage() is 1
GetTotalPCCoverage() is 2
GetTotalPCCoverage() is 3
aaaaaaaaaaaaaaaaa
GetTotalPCCoverage() is 4
wrong

5.Tracing PCs

当-fsanitize-coverage=trace-pc时,编译器将在每个边上插入 __sanitizer_cov_trace_pc()。在每个间接调用中都会插入一个附加的 ...=trace-pc,indirect-calls标志__sanitizer_cov_trace_pc_indirect(void *callee)。这些回调不是在Sanitizer运行时实现的,应该由用户定义。此机制用于模糊化Linux内核(https://github.com/google/syzkaller)。

6.Instrumentation points

  • 边(默认):边被检测(见下文)。
  • BB:基本块被检测。
  • 函数:只检测每个函数的入口块。

将这些标志与trace-pc-guard或trace-pc一起使用,如下所示: -fsanitize-coverage=func,trace-pc-guard

当使用edge或bb时,如果这种检测被认为是多余的,则某些边/块可能仍然没有被检测(修剪)。使用无修剪(例如-fsanitize coverage=bb,no-prune,trace-pc-guard)禁用修剪。这可能有助于更好的覆盖可视化。

7.Edge coverage

思考如下代码

void foo(int *a) {
  if (a)
    *a = 0;
}

它包含3个基本块,我们将它们命名为a、b、c:

A
|\
| \
|  B
| /
|/
C

如果块a、b和c都被覆盖了,我们肯定边a=>b和b=>c都被执行了,但是我们仍然不知道边a=>c是否被执行了。这种控制流图的边称为临界边。边缘级覆盖通过引入新的虚拟块来简单地分割所有关键边缘,然后插入这些块:

A
|\
| \
D  B
| /
|/
C

8.Tracing data flow

支持数据流引导的fuzz。使用-fsanitize-coverage=trace-cmp,编译器将在比较指令和switch语句周围插入额外的检测。类似地,使用-fsanitize-coverage=trace-div编译器将插入整数除法指令(以捕获除法的正确参数),使用 -fsanitize-coverage=trace-gep–llvm gep指令(以捕获数组索引)。

除非提供no-prune选项,否则不会检测某些比较指令。

// Called before a comparison instruction.
// Arg1 and Arg2 are arguments of the comparison.
//在比较指令之前调用。
//arg1和arg2是比较的参数。
void __sanitizer_cov_trace_cmp1(uint8_t Arg1, uint8_t Arg2);
void __sanitizer_cov_trace_cmp2(uint16_t Arg1, uint16_t Arg2);
void __sanitizer_cov_trace_cmp4(uint32_t Arg1, uint32_t Arg2);
void __sanitizer_cov_trace_cmp8(uint64_t Arg1, uint64_t Arg2);

// Called before a comparison instruction if exactly one of the arguments is constant.
// Arg1 and Arg2 are arguments of the comparison, Arg1 is a compile-time constant.
// These callbacks are emitted by -fsanitize-coverage=trace-cmp since 2017-08-11
//如果恰好有一个参数是常量,则在比较指令之前调用。
//arg1和arg2是比较的参数,arg1是编译时常量。
//这些回调是由-fsanitize-coverage=trace-cmp从2017-08-11发出的
void __sanitizer_cov_trace_const_cmp1(uint8_t Arg1, uint8_t Arg2);
void __sanitizer_cov_trace_const_cmp2(uint16_t Arg1, uint16_t Arg2);
void __sanitizer_cov_trace_const_cmp4(uint32_t Arg1, uint32_t Arg2);
void __sanitizer_cov_trace_const_cmp8(uint64_t Arg1, uint64_t Arg2);

// Called before a switch statement.
// Val is the switch operand.
// Cases[0] is the number of case constants.
// Cases[1] is the size of Val in bits.
// Cases[2:] are the case constants.
//在switch语句之前调用。
//val是开关操作数。
//cases[0]是case常量的数目。
//cases[1]是以位为单位的val的大小。
//cases[2:]是case常量。
void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);

// Called before a division statement.
// Val is the second argument of division.
//在division语句之前调用。
//val是除法的第二个参数。
void __sanitizer_cov_trace_div4(uint32_t Val);
void __sanitizer_cov_trace_div8(uint64_t Val);

// Called before a GetElemementPtr (GEP) instruction
// for every non-constant array index.
//在getelemementptr(gep)指令之前调用
//对于每个非常量数组索引。
void __sanitizer_cov_trace_gep(uintptr_t Idx);

举个例子 

//foo.cc
#include<iostream>
#include<string>
int add(int i,int j)
{
	return i+j;
}
int main()
{
	std::string s;
	int i;
	std::cin>>s;
	if(s[0]=='w'){
		i=add(3,5);
	}
	else{
		std::cout<<"wrong"<<std::endl;
	}
	return 0;
}
#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>
extern "C" void __sanitizer_cov_trace_const_cmp4(uint32_t Arg1, uint32_t Arg2)
{
	uintptr_t PC = reinterpret_cast<uintptr_t>(__builtin_return_address(0));
	printf("cmp4PC is %lu,Arg1 is %u,Arg2 is %u\n",PC,Arg1,Arg2);
}

 运行结果如下:
 

# clang++ -g  -fsanitize-coverage=trace-pc-guard,inline-8bit-counters,pc-table,trace-cmp foo.cc -c
# clang++ san.cc foo.o -fsanitize=address
# ./a.out 
qqqqqqqqqqqqqqq
cmp4PC is 5211447,Arg1 is 119,Arg2 is 113
wrong

9.Default implementation

消毒剂运行时(addresssanitizer、memorysanizer等)提供了一些覆盖率回调的默认实现。您可以使用此实现在进程出口将覆盖率转储到磁盘上。

例子:

//cov.cc
#include<stdio.h>
__attribute__((noinline))
void foo(){printf("foo\n");}
int main(int argc,char **argv)
{
	if(argc==2)
	{
		foo();
	}
	printf("main\n");
}
% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=trace-pc-guard
% ASAN_OPTIONS=coverage=1 ./a.out; wc -c *.sancov
main
SanitizerCoverage: ./a.out.7312.sancov 2 PCs written
24 a.out.7312.sancov
% ASAN_OPTIONS=coverage=1 ./a.out foo ; wc -c *.sancov
foo
main
SanitizerCoverage: ./a.out.7316.sancov 3 PCs written
24 a.out.7312.sancov
32 a.out.7316.sancov

每次运行使用sanitizercoverage检测的可执行文件时,都会在进程关闭期间创建一个*.sancov文件。如果可执行文件与插入指令的DSO动态链接,则还将为每个DSO创建一个*.sancov文件。

10.Sancov data format

*.sancov文件的格式非常简单:前8个字节是magic,0xc0bffffffffffff64和0xc0bffffffffffffffff32之一。魔术的最后一个字节定义了以下偏移量的大小。其余的数据是运行期间执行的相应二进制/dso中的偏移量。

11.Sancov Tool

提供了一个简单的sancov工具来处理覆盖率文件。该工具是llvm项目的一部分,目前仅在linux上受支持。它可以自主地处理符号化任务,而无需环境的任何额外支持。您需要传递.sancov文件(名为<module\u name><pid>.sancov)和所有对应的二进制elf文件的路径。sancov使用模块名和二进制文件名来匹配这些文件。

12.Coverage Reports

实验

.sancov文件包含的信息不足,无法生成源级别的覆盖率报告。缺少的信息包含在二进制文件的调试信息中。因此,必须对.sancov进行符号化,才能首先生成.symcov文件:

sancov -symbolize my_program.123.sancov my_program > my_program.123.symcov

通过运行将启动http服务器的tools/sancov/coverage-report-server.py脚本,可以在源代码上覆盖浏览.symcov文件。

13.Output directory

默认情况下,.sancov文件是在当前工作目录中创建的。这可以通过ASAN_OPTIONS=coverage_dir=/path更改:

% ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
% ls -l /tmp/cov/*sancov
-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
发布了43 篇原创文章 · 获赞 23 · 访问量 3万+

猜你喜欢

转载自blog.csdn.net/zhang14916/article/details/100924489