一次 ASAN 找不到 symbolizer 问题的分析与解决

AddressSanitizer(简称 ASAN)一直是一个检测分析 C/C++ 内存问题很方便的工具。WebRTC 工程集成了 ASAN,只要配置一个简单的选项即可对整个工程打开或关闭 ASAN,具体来说是 is_asan 选项。is_asan 选项的默认值为 false,在 args.gn 文件中写入 is_asan = true 行可以对整个工程打开 ASAN,在 args.gn 文件中写入 is_asan = false 行或者不配置 is_asan 选项可以对整个工程关闭 ASAN。

OpenRTCClient 工程的 Linux debug 构建是开了 ASAN 的。如果一切选项配置妥当,执行一个 C/C++ 应用程序,在出现内存问题时,ASAN 将调用 symbolizer 把出现内存问题的相关堆栈(如内存分配的堆栈和内存释放的内存堆栈)的内存地址转为文件行号和符号名。我们可以配置环境变量 ASAN_SYMBOLIZER_PATH 指向我们选择的 llvm symbolizer,如 export ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer-11,来告诉 ASAN 在需要把内存地址符号化时用什么工具。不配置环境变量 ASAN_SYMBOLIZER_PATH 时,ASAN 会尝试在 PATH 环境变量的各个路径下寻找名为 llvm-symbolizer 的可执行文件来用。如果既没有配置 ASAN_SYMBOLIZER_PATH 指向合适的 llvm symbolizer,PATH 环境变量的各个路径下也找不到名为 llvm-symbolizer 的可执行文件,则 ASAN 只能简单地把内存地址吐出来。

一次内存地址符号化失败

OpenRTCClient 工程中的示例应用 loop_connect,编译完成,在执行之前配置了环境变量 ASAN_SYMBOLIZER_PATH,在 loop_connect 执行过程中,出现内存问题时,依然没能成功将内存地址符号化,ASAN 输出如下:

=================================================================
==51148==ERROR: AddressSanitizer: heap-use-after-free on address 0x61200014eb40 at pc 0x5639128a0a85 bp 0x7ffcfdbb6b30 sp 0x7ffcfdbb6b28
READ of size 8 at 0x61200014eb40 thread T0
==51148==WARNING: invalid path to external symbolizer!
==51148==WARNING: Failed to use and restart external symbolizer!
    #0 0x5639128a0a84  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x32fda84) (BuildId: 542ad276a9f6ad54)
    #1 0x563915cdc29d  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x673929d) (BuildId: 542ad276a9f6ad54)
    #2 0x563910cd2bc1  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x172fbc1) (BuildId: 542ad276a9f6ad54)
    #3 0x563910cd2c08  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x172fc08) (BuildId: 542ad276a9f6ad54)
    #4 0x563910cd52f6  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x17322f6) (BuildId: 542ad276a9f6ad54)
    #5 0x563910cd3b40  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x1730b40) (BuildId: 542ad276a9f6ad54)
    #6 0x563910ccf40d  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x172c40d) (BuildId: 542ad276a9f6ad54)
    #7 0x563910ccbad9  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x1728ad9) (BuildId: 542ad276a9f6ad54)
    #8 0x7efd969cc0b2  (/lib/x86_64-linux-gnu/libc.so.6+0x240b2) (BuildId: 9fdb74e7b217d06c93172a8243f8547f947ee6d1)

0x61200014eb40 is located 0 bytes inside of 320-byte region [0x61200014eb40,0x61200014ec80)
freed by thread T0 here:
    #0 0x563910ca3887  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x1700887) (BuildId: 542ad276a9f6ad54)
    #1 0x5639122c1791  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x2d1e791) (BuildId: 542ad276a9f6ad54)
    #2 0x563910cbbc76  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x1718c76) (BuildId: 542ad276a9f6ad54)
    #3 0x563910cbbb1f  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x1718b1f) (BuildId: 542ad276a9f6ad54)
    #4 0x563910cbdbfa  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x171abfa) (BuildId: 542ad276a9f6ad54)
    #5 0x563910cb74c0  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x17144c0) (BuildId: 542ad276a9f6ad54)
    #6 0x563910cb1384  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x170e384) (BuildId: 542ad276a9f6ad54)
    #7 0x563910ccd4c4  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x172a4c4) (BuildId: 542ad276a9f6ad54)
    #8 0x563910ccd42c  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x172a42c) (BuildId: 542ad276a9f6ad54)
    #9 0x563910ccd105  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x172a105) (BuildId: 542ad276a9f6ad54)
    #10 0x563910cbc8ee  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x17198ee) (BuildId: 542ad276a9f6ad54)
    #11 0x563910cbc6e5  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x17196e5) (BuildId: 542ad276a9f6ad54)
    #12 0x563910ccd858  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x172a858) (BuildId: 542ad276a9f6ad54)
    #13 0x563910ccbc84  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x1728c84) (BuildId: 542ad276a9f6ad54)
    #14 0x563910ccad26  (~/OpenRTCClient/build/linux/x64/debug/loop_connect+0x1727d26) (BuildId: 542ad276a9f6ad54)

ASAN 提示说,拿到的 llvm symbolizer 地址无效,内存地址符号化失败。

ASAN 的实现

AddressSanitizer 是 LLVM 工程的 compiler-rt 子工程的一部分。在 GitHub 下载 llvm-project 工程的代码,compiler-rt 的代码就位于 llvm-project/compiler-rt 目录下。一般来说,我们需要构建 LLVM/Clang 来构建 compiler-rt。我们可以把 compiler-rt 和 llvm 及 clang 放在一起构建,但我们也可以分开来构建。

要把 compiler-rt 和 llvm 及 clang 放在一起构建,则把 compiler-rt 添加到传给 cmake 的 -DLLVM_ENABLE_RUNTIMES= 选项即可。

要分开构建,则首先单独 构建 LLVM 以获得 llvm-config 二进制可执行文件,然后运行如下命令:

$ cd llvm-project
$ git checkout -t origin/release/14.x
$ mkdir build-compiler-rt
$ cd build-compiler-rt
$ cmake ../compiler-rt -DLLVM_CONFIG_PATH=/path/to/llvm-config
$ make

OpenRTCClient 工程所基于的 WebRTC 代码库中的 llvm 已经更新到了 llvm-14,因而这里也切到 llvm-14 的分支来构建。)

编译生成的二进制库文件主要位于 llvm-project/build-compiler-rt/lib/linux/,如:

llvm-project/build-compiler-rt$ ls lib/linux/
clang_rt.crtbegin-x86_64.o                    libclang_rt.hwasan_aliases-x86_64.so       libclang_rt.scudo-x86_64.a
clang_rt.crtend-x86_64.o                      libclang_rt.hwasan_cxx-x86_64.a            libclang_rt.scudo-x86_64.so
libclang_rt.asan_cxx-x86_64.a                 libclang_rt.hwasan_cxx-x86_64.a.syms       libclang_rt.tsan_cxx-x86_64.a
libclang_rt.asan_cxx-x86_64.a.syms            libclang_rt.hwasan-x86_64.a                libclang_rt.tsan_cxx-x86_64.a.syms
libclang_rt.asan-preinit-x86_64.a             libclang_rt.hwasan-x86_64.a.syms           libclang_rt.tsan-x86_64.a
libclang_rt.asan_static-x86_64.a              libclang_rt.hwasan-x86_64.so               libclang_rt.tsan-x86_64.a.syms
libclang_rt.asan-x86_64.a                     libclang_rt.lsan-x86_64.a                  libclang_rt.tsan-x86_64.so
libclang_rt.asan-x86_64.a.syms                libclang_rt.msan_cxx-x86_64.a              libclang_rt.ubsan_minimal-x86_64.a
libclang_rt.asan-x86_64.so                    libclang_rt.msan_cxx-x86_64.a.syms         libclang_rt.ubsan_minimal-x86_64.a.syms
libclang_rt.builtins-x86_64.a                 libclang_rt.msan-x86_64.a                  libclang_rt.ubsan_minimal-x86_64.so
libclang_rt.cfi_diag-x86_64.a                 libclang_rt.msan-x86_64.a.syms             libclang_rt.ubsan_standalone_cxx-x86_64.a
libclang_rt.cfi-x86_64.a                      libclang_rt.orc-x86_64.a                   libclang_rt.ubsan_standalone_cxx-x86_64.a.syms
libclang_rt.dd-x86_64.a                       libclang_rt.profile-x86_64.a               libclang_rt.ubsan_standalone-x86_64.a
libclang_rt.dfsan-x86_64.a                    libclang_rt.safestack-x86_64.a             libclang_rt.ubsan_standalone-x86_64.a.syms
libclang_rt.dfsan-x86_64.a.syms               libclang_rt.scudo_cxx_minimal-x86_64.a     libclang_rt.ubsan_standalone-x86_64.so
libclang_rt.dyndd-x86_64.so                   libclang_rt.scudo_cxx-x86_64.a             libclang_rt.xray-basic-x86_64.a
libclang_rt.gwp_asan-x86_64.a                 libclang_rt.scudo_minimal-x86_64.a         libclang_rt.xray-fdr-x86_64.a
libclang_rt.hwasan_aliases_cxx-x86_64.a       libclang_rt.scudo_minimal-x86_64.so        libclang_rt.xray-profiling-x86_64.a
libclang_rt.hwasan_aliases_cxx-x86_64.a.syms  libclang_rt.scudo_standalone_cxx-x86_64.a  libclang_rt.xray-x86_64.a
libclang_rt.hwasan_aliases-x86_64.a           libclang_rt.scudo_standalone-x86_64.a
libclang_rt.hwasan_aliases-x86_64.a.syms      libclang_rt.scudo_standalone-x86_64.so

开启 AddressSanitizer 在编译器/链接器层面,是给编译器和链接器加上特殊的参数 -fsanitize=address,如链接 OpenRTCClient 的示例应用 loop_connect 实际执行的命令如下:

python3 "../../../../webrtc/build/toolchain/gcc_link_wrapper.py" --output="./loop_connect" -- ../../../../build_system/llvm-build/linux/linux/Release+Asserts/bin/clang++ -fuse-ld=lld -Wl,--fatal-warnings -Wl,--build-id -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,--color-diagnostics -Wl,--no-call-graph-profile-sort -m64 -no-canonical-prefixes -Wl,--gdb-index -rdynamic --sysroot=../../../../build_system/sysroot/linux/debian_sid_amd64-sysroot -fsanitize=address -pie -Wl,--disable-new-dtags -Wl,-u_sanitizer_options_link_helper -fsanitize=address -o "./loop_connect" -Wl,--start-group @"./loop_connect.rsp"  -Wl,--end-group  -lX11 -lXcomposite -lXext -lXrender -latomic -ldl -lpthread -lrt -lgmodule-2.0 -lgthread-2.0 -lgtk-3 -lgdk-3 -lpangocairo-1.0 -lpango-1.0 -lharfbuzz -latk-1.0 -lcairo-gobject -lcairo -lgdk_pixbuf-2.0 -lgio-2.0 -lgobject-2.0 -lglib-2.0 -lm -lz

链接器在看到 -fsanitize=address 参数时,会根据编译的目标架构,去链接前面看到的 compiler-rt 编译出来的某个 libclang_rt.asan* 库。对于 OpenRTCClient 的示例应用 loop_connect 来说,链接可执行文件时,传入了 --sysroot 参数,这样就会在 --sysroot 参数指定的路径下查找编译链接时需要的所有库文件和头文件等。具体来说,链接 loop_connect 时将链接到 OpenRTCClient/build_system/llvm-build/linux/linux/Release+Asserts/lib/clang/14.0.0/lib/linuxOpenRTCClient/build_system/llvm-build/linux/linux/Release+Asserts/lib/clang/14.0.0/lib/x86_64-unknown-linux-gnu 目录下对应于目标架构的 libclang_rt.asan* 库文件。

为了能够调试 AddressSanitizer,我们需要让链接器去链接我们编译出来的 compiler-rt 库。具体做法是,把 OpenRTCClient/build_system/llvm-build/linux/linux/Release+Asserts/lib/clang/14.0.0/lib/linuxOpenRTCClient/build_system/llvm-build/linux/linux/Release+Asserts/lib/clang/14.0.0/lib/x86_64-unknown-linux-gnu 随意改个其它名字,同时在 OpenRTCClient/build_system/llvm-build/linux/linux/Release+Asserts/lib/clang/14.0.0/lib/ 目录下创建一个名为 linux 的符号链接指向我们编译 compiler-rt 的目录 llvm-project/build-compiler-rt/lib/linux,这样我们修改 compiler-rt 的代码,编译 compiler-rt,然后链接 loop_connect,会将我们修改过的 compiler-rt 代码链接进去。

AddressSanitizer 找不到 symbolizer 问题分析

寻着 AddressSanitizer 给出来的提示信息,在 compiler-rt 的代码中搜常量字符串 "WARNING: invalid path to external symbolizer!",我们可以发现,它位于 llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cpp,相关的代码如下:

bool SymbolizerProcess::StartSymbolizerSubprocess() {
  if (!FileExists(path_)) {
    if (!reported_invalid_path_) {
      Report("WARNING: invalid path to external symbolizer!\n");
      reported_invalid_path_ = true;
    }
    return false;
  }

  const char *argv[kArgVMax];
  GetArgV(path_, argv);
  pid_t pid;

我们可以修改这里的代码,来查下 AddressSanitizer 在这里看到的 symbolizer 的地址 path_ 具体是什么。可以看到,这里的 symbolizer 的地址 path_ 具体是 /media/data/multimedia/OpenRTCClient/build/linux/x64/debug//../../third_party/llvm-build/Release+Asserts/bin/llvm-symbolizer。这个值貌似跟我们通过环境变量 ASAN_SYMBOLIZER_PATH 配置的地址完全没有关系。

path_ 的值是在 SymbolizerProcess 类的构造函数中传入的,具体的代码如下:

SymbolizerProcess::SymbolizerProcess(const char *path, bool use_posix_spawn)
    : path_(path),
      input_fd_(kInvalidFd),
      output_fd_(kInvalidFd),
      times_restarted_(0),
      failed_to_start_(false),
      reported_invalid_path_(false),
      use_posix_spawn_(use_posix_spawn) {
  CHECK(path_);
  CHECK_NE(path_[0], '\0');
}

把我们的可执行文件丢进 GDB 执行,在 SymbolizerProcess 类的构造函数这里加个断点,可以看到如下这样的调用堆栈:

#0  __sanitizer::SymbolizerProcess::SymbolizerProcess(char const*, bool)
    (use_posix_spawn=false, path=0x7ffff3403000 "/media/data/multimedia/OpenRTCClient/build/linux/x64/debug//../../third_party/llvm-build/Release+Asserts/bin/llvm-symbolizer", this=0x7ffff7fab000) at ~llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:456
#1  __sanitizer::LLVMSymbolizerProcess::LLVMSymbolizerProcess(char const*)
    (path=0x7ffff3403000 "/media/data/multimedia/OpenRTCClient/build/linux/x64/debug//../../third_party/llvm-build/Release+Asserts/bin/llvm-symbolizer", this=0x7ffff7fab000) at ~llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:240
#2  __sanitizer::LLVMSymbolizer::LLVMSymbolizer(char const*, __sanitizer::LowLevelAllocator*)
    (this=0x7ffff7fb4000, path=0x7ffff3403000 "/media/data/multimedia/OpenRTCClient/build/linux/x64/debug//../../third_party/llvm-build/Release+Asserts/bin/llvm-symbolizer", allocator=<optimized out>) at ~llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:292
#3  0x0000555556c45532 in __sanitizer::ChooseExternalSymbolizer (allocator=<optimized out>)
    at ~llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common.h:1075
#4  __sanitizer::ChooseSymbolizerTools (allocator=<optimized out>, list=<synthetic pointer>)
    at ~llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cpp:487
#5  __sanitizer::Symbolizer::PlatformInit() ()
    at ~llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cpp:500
#6  0x0000555556c42455 in __sanitizer::Symbolizer::GetOrInit() ()
    at ~llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:24
#7  0x0000555556c457ad in __sanitizer::Symbolizer::LateInitialize() ()
    at ~llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cpp:505
#8  0x0000555556c199fd in __asan::AsanInitInternal() () at ~llvm-project/compiler-rt/lib/asan/asan_rtl.cpp:495
#9  0x00007ffff7fe0ce6 in  () at /lib64/ld-linux-x86-64.so.2
#10 0x00007ffff7fd013a in  () at /lib64/ld-linux-x86-64.so.2
#11 0x0000000000000001 in  ()

llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cpp 文件中定义的 __sanitizer::ChooseExternalSymbolizer () 函数,我们可以看到 SymbolizerProcess 对象的 path_ 的来源:

static SymbolizerTool *ChooseExternalSymbolizer(LowLevelAllocator *allocator) {
  const char *path = common_flags()->external_symbolizer_path;

  if (path && internal_strchr(path, '%')) {
    char *new_path = (char *)InternalAlloc(kMaxPathLength);
    SubstituteForFlagValue(path, new_path, kMaxPathLength);
    path = new_path;
  }

  const char *binary_name = path ? StripModuleName(path) : "";
  static const char kLLVMSymbolizerPrefix[] = "llvm-symbolizer";
  if (path && path[0] == '\0') {
    VReport(2, "External symbolizer is explicitly disabled.\n");
    return nullptr;
  } else if (!internal_strncmp(binary_name, kLLVMSymbolizerPrefix,
                               internal_strlen(kLLVMSymbolizerPrefix))) {
    VReport(2, "Using llvm-symbolizer at user-specified path: %s\n", path);
    return new(*allocator) LLVMSymbolizer(path, allocator);
  } else if (!internal_strcmp(binary_name, "atos")) {
#if SANITIZER_MAC
    VReport(2, "Using atos at user-specified path: %s\n", path);
    return new(*allocator) AtosSymbolizer(path, allocator);
#else  // SANITIZER_MAC
    Report("ERROR: Using `atos` is only supported on Darwin.\n");
    Die();
#endif  // SANITIZER_MAC
  } else if (!internal_strcmp(binary_name, "addr2line")) {
    VReport(2, "Using addr2line at user-specified path: %s\n", path);
    return new(*allocator) Addr2LinePool(path, allocator);
  } else if (path) {
    Report("ERROR: External symbolizer path is set to '%s' which isn't "
           "a known symbolizer. Please set the path to the llvm-symbolizer "
           "binary or other known tool.\n", path);
    Die();
  }

  // Otherwise symbolizer program is unknown, let's search $PATH
  CHECK(path == nullptr);
#if SANITIZER_MAC
  if (const char *found_path = FindPathToBinary("atos")) {
    VReport(2, "Using atos found at: %s\n", found_path);
    return new(*allocator) AtosSymbolizer(found_path, allocator);
  }
#endif  // SANITIZER_MAC
  if (const char *found_path = FindPathToBinary("llvm-symbolizer")) {
    VReport(2, "Using llvm-symbolizer found at: %s\n", found_path);
    return new(*allocator) LLVMSymbolizer(found_path, allocator);
  }
  if (common_flags()->allow_addr2line) {
    if (const char *found_path = FindPathToBinary("addr2line")) {
      VReport(2, "Using addr2line found at: %s\n", found_path);
      return new(*allocator) Addr2LinePool(found_path, allocator);
    }
  }
  return nullptr;
}

__sanitizer::ChooseExternalSymbolizer () 这个函数里,AddressSanitizer 会尝试根据 common_flags()->external_symbolizer_path 等值确定 symbolizer 程序的路径。我们可以看到,这里的 common_flags()->external_symbolizer_path 的实际值为 %d/../../third_party/llvm-build/Release+Asserts/bin/llvm-symbolizer,上面看到的 SymbolizerProcess 对象的 path_ 即是根据这个值算出来的。

llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_flags.h 文件中,common_flags() 函数的定义为:

// Functions to get/set global CommonFlags shared by all sanitizer runtimes:
extern CommonFlags common_flags_dont_use;
inline const CommonFlags *common_flags() {
  return &common_flags_dont_use;
}

inline void SetCommonFlagsDefaults() {
  common_flags_dont_use.SetDefaults();
}

// This function can only be used to setup tool-specific overrides for
// CommonFlags defaults. Generally, it should only be used right after
// SetCommonFlagsDefaults(), but before ParseCommonFlagsFromString(), and
// only during the flags initialization (i.e. before they are used for
// the first time).
inline void OverrideCommonFlags(const CommonFlags &cf) {
  common_flags_dont_use.CopyFrom(cf);
}

common_flags() 函数返回的是一个全局对象。这个全局对象的值,主要由 llvm-project/compiler-rt/lib/asan/asan_flags.cpp 文件中的 InitializeFlags() 函数来更新,这个函数的定义如下:

void InitializeFlags() {
  // Set the default values and prepare for parsing ASan and common flags.
  SetCommonFlagsDefaults();
  {
    CommonFlags cf;
    cf.CopyFrom(*common_flags());
    cf.detect_leaks = cf.detect_leaks && CAN_SANITIZE_LEAKS;
    cf.external_symbolizer_path = GetEnv("ASAN_SYMBOLIZER_PATH");
    cf.malloc_context_size = kDefaultMallocContextSize;
    cf.intercept_tls_get_addr = true;
    cf.exitcode = 1;
    OverrideCommonFlags(cf);
  }
  Flags *f = flags();
  f->SetDefaults();

  FlagParser asan_parser;
  RegisterAsanFlags(&asan_parser, f);
  RegisterCommonFlags(&asan_parser);

  // Set the default values and prepare for parsing LSan and UBSan flags
  // (which can also overwrite common flags).
#if CAN_SANITIZE_LEAKS
  __lsan::Flags *lf = __lsan::flags();
  lf->SetDefaults();

  FlagParser lsan_parser;
  __lsan::RegisterLsanFlags(&lsan_parser, lf);
  RegisterCommonFlags(&lsan_parser);
#endif

#if CAN_SANITIZE_UB
  __ubsan::Flags *uf = __ubsan::flags();
  uf->SetDefaults();

  FlagParser ubsan_parser;
  __ubsan::RegisterUbsanFlags(&ubsan_parser, uf);
  RegisterCommonFlags(&ubsan_parser);
#endif

  if (SANITIZER_MAC) {
    // Support macOS MallocScribble and MallocPreScribble:
    // <https://developer.apple.com/library/content/documentation/Performance/
    // Conceptual/ManagingMemory/Articles/MallocDebug.html>
    if (GetEnv("MallocScribble")) {
      f->max_free_fill_size = 0x1000;
    }
    if (GetEnv("MallocPreScribble")) {
      f->malloc_fill_byte = 0xaa;
    }
  }

  // Override from ASan compile definition.
  const char *asan_compile_def = MaybeUseAsanDefaultOptionsCompileDefinition();
  asan_parser.ParseString(asan_compile_def);

  // Override from user-specified string.
  const char *asan_default_options = __asan_default_options();
  asan_parser.ParseString(asan_default_options);
#if CAN_SANITIZE_UB
  const char *ubsan_default_options = __ubsan_default_options();
  ubsan_parser.ParseString(ubsan_default_options);
#endif
#if CAN_SANITIZE_LEAKS
  const char *lsan_default_options = __lsan_default_options();
  lsan_parser.ParseString(lsan_default_options);
#endif

  // Override from command line.
  asan_parser.ParseStringFromEnv("ASAN_OPTIONS");
#if CAN_SANITIZE_LEAKS
  lsan_parser.ParseStringFromEnv("LSAN_OPTIONS");
#endif
#if CAN_SANITIZE_UB
  ubsan_parser.ParseStringFromEnv("UBSAN_OPTIONS");
#endif

  InitializeCommonFlags();

  // TODO(eugenis): dump all flags at verbosity>=2?
  if (Verbosity()) ReportUnrecognizedFlags();

  if (common_flags()->help) {
    // TODO(samsonov): print all of the flags (ASan, LSan, common).
    asan_parser.PrintFlagDescriptions();
  }

  // Flag validation:
  if (!CAN_SANITIZE_LEAKS && common_flags()->detect_leaks) {
    Report("%s: detect_leaks is not supported on this platform.\n",
           SanitizerToolName);
    Die();
  }
  // Ensure that redzone is at least ASAN_SHADOW_GRANULARITY.
  if (f->redzone < (int)ASAN_SHADOW_GRANULARITY)
    f->redzone = ASAN_SHADOW_GRANULARITY;
  // Make "strict_init_order" imply "check_initialization_order".
  // TODO(samsonov): Use a single runtime flag for an init-order checker.
  if (f->strict_init_order) {
    f->check_initialization_order = true;
  }
  CHECK_LE((uptr)common_flags()->malloc_context_size, kStackTraceMax);
  CHECK_LE(f->min_uar_stack_size_log, f->max_uar_stack_size_log);
  CHECK_GE(f->redzone, 16);
  CHECK_GE(f->max_redzone, f->redzone);
  CHECK_LE(f->max_redzone, 2048);
  CHECK(IsPowerOfTwo(f->redzone));
  CHECK(IsPowerOfTwo(f->max_redzone));

  // quarantine_size is deprecated but we still honor it.
  // quarantine_size can not be used together with quarantine_size_mb.
  if (f->quarantine_size >= 0 && f->quarantine_size_mb >= 0) {
    Report("%s: please use either 'quarantine_size' (deprecated) or "
           "quarantine_size_mb, but not both\n", SanitizerToolName);
    Die();
  }
  if (f->quarantine_size >= 0)
    f->quarantine_size_mb = f->quarantine_size >> 20;
  if (f->quarantine_size_mb < 0) {
    const int kDefaultQuarantineSizeMb =
        (ASAN_LOW_MEMORY) ? 1UL << 4 : 1UL << 8;
    f->quarantine_size_mb = kDefaultQuarantineSizeMb;
  }
  if (f->thread_local_quarantine_size_kb < 0) {
    const u32 kDefaultThreadLocalQuarantineSizeKb =
        // It is not advised to go lower than 64Kb, otherwise quarantine batches
        // pushed from thread local quarantine to global one will create too
        // much overhead. One quarantine batch size is 8Kb and it  holds up to
        // 1021 chunk, which amounts to 1/8 memory overhead per batch when
        // thread local quarantine is set to 64Kb.
        (ASAN_LOW_MEMORY) ? 1 << 6 : FIRST_32_SECOND_64(1 << 8, 1 << 10);
    f->thread_local_quarantine_size_kb = kDefaultThreadLocalQuarantineSizeKb;
  }
  if (f->thread_local_quarantine_size_kb == 0 && f->quarantine_size_mb > 0) {
    Report("%s: thread_local_quarantine_size_kb can be set to 0 only when "
           "quarantine_size_mb is set to 0\n", SanitizerToolName);
    Die();
  }
  if (!f->replace_str && common_flags()->intercept_strlen) {
    Report("WARNING: strlen interceptor is enabled even though replace_str=0. "
           "Use intercept_strlen=0 to disable it.");
  }
  if (!f->replace_str && common_flags()->intercept_strchr) {
    Report("WARNING: strchr* interceptors are enabled even though "
           "replace_str=0. Use intercept_strchr=0 to disable them.");
  }
  if (!f->replace_str && common_flags()->intercept_strndup) {
    Report("WARNING: strndup* interceptors are enabled even though "
           "replace_str=0. Use intercept_strndup=0 to disable them.");
  }
}

InitializeFlags() 函数中,首先会给 CommonFlags common_flags_dont_use 设置默认值,随后会从环境变量里获取一些值来更新,即我们配置的环境变量 ASAN_SYMBOLIZER_PATH,之后依次根据从 MaybeUseAsanDefaultOptionsCompileDefinition()__asan_default_options() 等函数中,以及从 ASAN_OPTIONS 等环境变量中获取选项,来覆盖前面的设置。

在这里,我们打印从环境变量 ASAN_SYMBOLIZER_PATH 获取的值,发现它就是我们配置的值 /usr/bin/llvm-symbolizer-11llvm-project/compiler-rt/lib/asan/asan_flags.cpp 文件中 MaybeUseAsanDefaultOptionsCompileDefinition() 函数的定义如下:

static const char *MaybeUseAsanDefaultOptionsCompileDefinition() {
#ifdef ASAN_DEFAULT_OPTIONS
  return SANITIZER_STRINGIFY(ASAN_DEFAULT_OPTIONS);
#else
  return "";
#endif
}

llvm-project/compiler-rt/lib/asan/asan_flags.cpp 文件中 __asan_default_options() 函数的定义如下:

SANITIZER_INTERFACE_WEAK_DEF(const char*, __asan_default_options, void) {
  return "";
}

直观地看,这两个函数返回的配置选项不会更新 common_flags()->external_symbolizer_path。但实际上,经过了对 __asan_default_options() 函数的返回值的处理之后,common_flags()->external_symbolizer_path 的值被更新为了 %d/../../third_party/llvm-build/Release+Asserts/bin/llvm-symbolizer。且 __asan_default_options() 函数实际返回的字符串也不是上面我们看到的 __asan_default_options() 函数定义中的空字符串,而是如下这个字符串:

check_printf=1 use_sigaltstack=1 strip_path_prefix=/../../ fast_unwind_on_fatal=1 detect_stack_use_after_return=1 symbolize=1 detect_leaks=0 allow_user_segv_handler=1 external_symbolizer_path=%d/../../third_party/llvm-build/Release+Asserts/bin/llvm-symbolizer

我们把我们的可执行文件丢进 GDB
中跑,并给 __asan_default_options() 函数加个断点。令我们惊讶的是,断点的位置并没有被加在 llvm-project/compiler-rt/lib/asan/asan_flags.cpp 文件中,而是加在了 WebRTC 的代码中 webrtc/build/sanitizers/sanitizer_options.cc

(gdb) break __asan_default_options 
warning: Could not find DWO CU obj/build/config/sanitizers/options_sources/sanitizer_options.dwo(0x4d51bcd290d078c0) referenced by CU at offset 0x43f087 [in module /media/data/multimedia/OpenRTCClient/build/linux/x64/debug/loop_connect]
Breakpoint 1 at 0x81c6e34: file ../../../../webrtc/build/sanitizers/sanitizer_options.cc, line 75.

再来审视一下 __asan_default_options() 函数的声明和定义,发现在 llvm 中它被定义为了一个弱符号。而在 WebRTC 的代码 webrtc/build/sanitizers/sanitizer_options.cc 中有 __asan_default_options() 函数的定义如下:

#if defined(ADDRESS_SANITIZER)
// Default options for AddressSanitizer in various configurations:
//   check_printf=1 - check the memory accesses to printf (and other formatted
//     output routines) arguments.
//   use_sigaltstack=1 - handle signals on an alternate signal stack. Useful
//     for stack overflow detection.
//   strip_path_prefix=/../../ - prefixes up to and including this
//     substring will be stripped from source file paths in symbolized reports
//   fast_unwind_on_fatal=1 - use the fast (frame-pointer-based) stack unwinder
//     to print error reports. V8 doesn't generate debug info for the JIT code,
//     so the slow unwinder may not work properly.
//   detect_stack_use_after_return=1 - use fake stack to delay the reuse of
//     stack allocations and detect stack-use-after-return errors.
//   symbolize=1 - enable in-process symbolization.
//   external_symbolizer_path=... - provides the path to llvm-symbolizer
//     relative to the main executable
#if defined(OS_LINUX) || defined(OS_CHROMEOS)
const char kAsanDefaultOptions[] =
    "check_printf=1 use_sigaltstack=1 strip_path_prefix=/../../ "
    "fast_unwind_on_fatal=1 detect_stack_use_after_return=1 "
    "symbolize=1 detect_leaks=0 allow_user_segv_handler=1 "
    "external_symbolizer_path=%d/../../third_party/llvm-build/Release+Asserts/"
    "bin/llvm-symbolizer";

#elif defined(OS_APPLE)
const char* kAsanDefaultOptions =
    "check_printf=1 use_sigaltstack=1 strip_path_prefix=/../../ "
    "fast_unwind_on_fatal=1 detect_stack_use_after_return=1 ";

#elif defined(OS_WIN)
const char* kAsanDefaultOptions =
    "check_printf=1 use_sigaltstack=1 strip_path_prefix=\\..\\..\\ "
    "fast_unwind_on_fatal=1 detect_stack_use_after_return=1 "
    "symbolize=1 external_symbolizer_path=%d/../../third_party/"
    "llvm-build/Release+Asserts/bin/llvm-symbolizer.exe";
#endif  // defined(OS_LINUX) || defined(OS_CHROMEOS)

#if defined(OS_LINUX) || defined(OS_CHROMEOS) || defined(OS_APPLE) || \
    defined(OS_WIN)
// Allow NaCl to override the default asan options.
extern const char* kAsanDefaultOptionsNaCl;
__attribute__((weak)) const char* kAsanDefaultOptionsNaCl = nullptr;

SANITIZER_HOOK_ATTRIBUTE const char *__asan_default_options() {
  if (kAsanDefaultOptionsNaCl)
    return kAsanDefaultOptionsNaCl;
  return kAsanDefaultOptions;
}

extern char kASanDefaultSuppressions[];

SANITIZER_HOOK_ATTRIBUTE const char *__asan_default_suppressions() {
  return kASanDefaultSuppressions;
}
#endif  // defined(OS_LINUX) || defined(OS_CHROMEOS) || defined(OS_APPLE) ||
        // defined(OS_WIN)
#endif  // ADDRESS_SANITIZER

至此不难确认,我们通过环境变量 ASAN_SYMBOLIZER_PATH 配置的 symbolizer,被 WebRTC 的代码中的配置选项给覆盖了。

WebRTC 中相关的改动是 https://chromium.googlesource.com/chromium/src/build/+/919d061c2f455cc07b687a48322785b3b61f1455%5E%21/sanitizers/sanitizer_options.cc 这个 commit 提交的。

对于这个问题,解决方案也不难确认,把 WebRTC 的代码 webrtc/build/sanitizers/sanitizer_options.cc 中,配置 AddressSanitizer 的 symbolizer 的部分给去掉即可。

参考文档

“compiler-rt” runtime libraries

猜你喜欢

转载自blog.csdn.net/tq08g2z/article/details/124525987