四、ONNX Runtime中的构建工具CMake使用指南和ABI_Dev_Notes

翻译来源
通常,有多种方法可以完成同一件事。这就是为什么我们有此指南。这与哪个正确/错误无关。这是为了使项目代码朝着同一方向发展。
构建一套软件通常有很多方法,这里是ONNX Runtime团队建议的构建规范。
首先cmake的版本:
cmake_minimum_required(VERSION 3.13)

将影响最小化 Scope the impact to minimal

如果您想更改某些设置,请尝试将影响范围缩小到本地。
If you want to change some setting, please try to scope down the impact to be local.

  • 使用 target_include_directories 而不是 include_directories
  • 使用 target_compile_definitions 而不是 add_definitions
  • 使用 target_compile_options 而不是 add_compile_options
  • 不要使用能改变全局标志位的变量如 CMAKE_CXX_FLAGS

例如,将宏定义添加到一个VC项目中, 应该使用 target_compile_definitions, 而不是add_definitions.

静态库顺序很重要 Static library order matters

首先,应该知道,将静态库链接到可执行文件(或共享库)目标时,顺序很重要。
比方说,如果A和B是静态库,C是可执行程序。

  • A depends B.
  • C depends A and B.

然后我们应该写成:

target_link_libraries(C PRIVATE A B)

而不是

target_link_libraries(C PRIVATE B A)  #Wrong!

在windows平台,如果一个符号在多个库中定义,则静态库的顺序确实很重要。
On Windows, the order of static libraries does matter if a symbol is defined in more than one library.
在linux平台,只会在一个静态库引用另一个静态库时需要考虑顺序。
On Linux, it matters when one static library references another.
因此,一般而言,请始终按正确的顺序排列依赖库(根据它们的依赖关系)。
So, in general, please always put them in right order (according to their dependency relationship).

不要使用 target_link_libraries去链接静态库(Don’t call target_link_libraries on static libraries)

如果万不得已,千万不要使用 target_link_libraries去链接静态库。

前所述,library的顺序很重要。如果你在一行中显式列出所有库,并且如果某些库位置错误,则很容易定位到错误的地方。
然而,如果任何通过target_link_libraries链接的静态库,

  • 首先,您应该知道,静态库没有链接步骤
  • 其次,一旦遇到顺序问题,将很难修复。因为许多依赖是隐性的,所以它们的位置将不受我们的控制。

You could do it, but please don’t.
As we said before, library order matters. If you explicitly list all the libs in one line, and if some libs were in wrong position, it’s easy to fix.

However, if any static lib was built with target_link_libraries,

  • First you should know ,there is no link step for a static lib
  • Second, once you hit the ordering problem, it would be harder to fix. Because many of the deps were implicit, and their position would be out of our control.

任何一个Linux程序(或者共享库)都应链接到libpthread和libatomic Every linux program(and shared lib) should link to libpthread and libatomic

在Linux世界中,有两组pthread符号。 在标准c库中是假的,在pthread.so中才是真的。 如果在进程启动时未加载真正的线程库,则该进程因为缺少核心部分而不会使用多线程。
因此,我们将“ Threads :: Threads”附加到每个共享lib(。so,。dll)和可执行程序的lib列表中。虽然这种操作很容易被忘记,但是如果真忘记添加了,编译的时候会报undefined错误。
另一个相关的事情是:如果使用了std :: atomic,也请在其中添加atomic lib。因为std :: atomic的某些用法需要链接到libatomic。see https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_concurrency.html
注意:然而,在极少数情况下,即使您告诉链接程序用pthread链接程序,有时它也不会听您的话。它可能会忽略你的链接顺序,然后报一堆错误。see https://github.com/protocolbuffers/protobuf/issues/5923.

In Linux world, there are two set of pthread symbols. A fake one in the standard c library, and a real one in pthread.so. If the real one is not loaded while the process was starting up, then the process shouldn’t use multiple threading because the core part was missing.

So, We append “Threads::Threads” to the lib list of every shared lib(.so,.dll) and exe target. It’s easy to get missed. If it happened, the behavior is undefined.

Another related thing is: if std::atomic was in use, please also add the atomic lib there. Because some uses of std::atomic require linking to libatomic. see https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_concurrency.html

NOTE: However, in rare cases, even you told linker to link the program with pthread, sometimes it doesn’t listen to you. It may ignore your order, cause issues. see https://github.com/protocolbuffers/protobuf/issues/5923.

不要直接使用"-pthread"标志位 Don’t use the “-pthread” flag directly.

Because:

  1. CUDA的编译器nvcc不支持"-pthread"标志位 It doesn’t work with nvcc(the CUDA compiler)
  2. 不便携 Not portable.

不要费心将此"-pthread"标志位添加到编译时标志中。在Linux上,它没有用。在某些非常旧的类Unix系统上,它可能会有所帮助,但我们目前仅支持Ubuntu 16.04。

Don’t bother to add this flag to your compile time flags. On Linux, it’s useless. On some very old unix-like system, it may be helpful, but we only support Ubuntu 16.04.

Use “Threads::Threads” for linking. Use nothing for compiling.

CUDA项目应使用新的CUDA cmake方法 CUDA projects should use the new cmake CUDA approach

There are two ways of enabling CUDA in cmake.

  1. (new): enable_language(CUDA)
  2. (old): find_package(CUDA)

Use the first one, because the second one is deprecated. Don’t use “find_package(CUDA)”. It also means, don’t use the vars like:

  • CUDA_NVCC_FLAGS
  • CUDA_INCLUDE_DIRS
  • CUDA_LIBRARIES

So, be careful on this when you copy code from another project to ours, the changes may not work.

ABI_Dev_Notes

Global Variables

windows系统中全局变量可能在“ DllMain”内部构造或破坏。在DLL入口点中可以安全执行的操作有很多限制。例如,你不能将ONNX Runtime InferenceSession放入全局变量。
Global variables may get constructed or destructed inside “DllMain”. There are significant limits on what you can safely do in a DLL entry point. See ‘DLL General Best Practices’. For example, you can’t put a ONNX Runtime InferenceSession into a global variable.

线程局部变量Thread Local variables

线程局部变量必须是局部函数,在Windows上将被初始化为首次使用。 否则,它可能无法工作。
此外,如果变量具有非平凡的析构函数,则必须在卸载onnxruntime.dll之前销毁这些线程局部变量。 这意味着,只有onnxruntime内部线程可以访问这些变量。 就是说,该线程必须由onnxruntime创建并由onnxruntime销毁。
Thread Local variables must be function local, that on Windows they will be initialized as the first time of use. Otherwise, it may not work.
Also, you must destroy these thread Local variables before onnxruntime.dll is unloaded, if the variable has a non-trivial destructor. That means, only onnxruntime internal threads can access these variables. It is, the thread must be created by onnxruntime and destroyed by onnxruntime.

不能存在undefined symbols

在Windows上,您无法使用undefined symbols构建DLL。 每个symbol必须在链接时解析。 在Linux上,您可以。
为了简化操作,我们要求每个符号都必须在链接时解析。 相同的规则适用于所有平台。 对于我们来说,这更容易控制符号的可见性。
On Windows, you can’t build a DLL with undefined symbols. Every symbol must be get resolved at link time. On Linux, you can.
In order to simplify things, we require every symbol must get resolved at link time. The same rule applies for all the platforms. And this is easier for us to control symbol visibility.

默认可见性以及如何导出 symbol Default visibility and how to export a symbol

在Linux上,默认情况下,链接器认为每个符号都是全局的。 它易于使用,但也容易引起冲突和core dumps。 我们在ONNX python绑定中遇到了太多此类问题。 确实,如果您有一个不错的设计,则对于每个共享库,只需要导出一个功能即可。 ONNX Runtime python接口绑定就是一个很好的例子。See pybind11 FAQ for more info.
为了控制可见性,我们在Linux上使用linkder version scripts,在Windows上使用def文件。它们工作原理类似,实现了下面的:

  1. 只导出C 函数接口.
  2. 所有函数名称必须在文本文件中明确列出
  3. 不要导出任何C ++类/结构或全局变量

另外,在Linux和Mac操作系统上,所有代码都必须使用“ -fPIC”进行编译。 在Windows上,我们不使用dllexport,但仍然需要dllimport。
因此,我们的DLLEXPORT宏类似于:

#ifdef _WIN32
// Define ORT_DLL_IMPORT if your program is dynamically linked to Ort.
#ifdef ORT_DLL_IMPORT
#define ORT_EXPORT __declspec(dllimport)
#else
#define ORT_EXPORT
#endif
#else
#define ORT_EXPORT
#endif

On Linux, by default, at linker’s view, every symbol is global. It’s easy to use but it’s also much easier to cause conflicts and core dumps. We have encountered too many such problems in ONNX python binding. Indeed, if you have a well design, for each shared lib, you only need to export one function. ONNX Runtime python binding is a good example. See pybind11 FAQ for more info.

For controling the visibility, we use linkder version scripts on Linux and def files on Windows. They work similar. That:

  1. Only C functions can be exported.
  2. All the function names must be explicitly listed in a text file.
  3. Don’t export any C++ class/struct, or global variable.

Also, on Linux and Mac operating systems, all the code must be compiled with “-fPIC”.
On Windows, we don’t use dllexport but we still need dllimport.

Therefore, our DLLEXPORT macro is like:

#ifdef _WIN32
// Define ORT_DLL_IMPORT if your program is dynamically linked to Ort.
#ifdef ORT_DLL_IMPORT
#define ORT_EXPORT __declspec(dllimport)
#else
#define ORT_EXPORT
#endif
#else
#define ORT_EXPORT
#endif

RTLD_LOCAL vs RTLD_GLOBAL

RTLD_LOCAL 和 RTLD_GLOBAL是POSIX系统里的 dlopen(3)函数的2个标志位。默认情况下为RTLD_LOCAL。基本上可以说,在Windows上没有类似RTLD_GLOBAL之类的东西。
在一种情况下,您需要在POSIX系统上使用RTLD_GLOBAL

  1. 有一个共享库,它由某些应用程序动态加载(如python或dotnet)
  2. 共享库静态链接到ONNX运行时
  3. 共享库需要动态加载custom op
    然后,应该使用RTLD_GLOBAL而不是RTLD_LOCAL加载共享库。否则,在custom op 库中,它将找不到ONNX运行时符号。

RTLD_LOCAL and RTLD_GLOBAL are two flags of dlopen(3) function on POSIX systems. By default, it’s RTLD_LOCAL. And basically you can say, there no corresponding things like RTLD_GLOBAL on Windows.

There is one case you need to use RTLD_GLOBAL on POSIX systems:
5. There is a shared lib which is dynamically loaded by some application(like python or dotnet)
6. The shared lib is statically linked to ONNX Runtime
7. The shared lib needs to dynamically load a custom op

Then the shared lib should be loaded with RTLD_GLOBAL, not RTLD_LOCAL. Otherwise in the custom op library, it can not find ONNX Runtime symbols.

发布了45 篇原创文章 · 获赞 21 · 访问量 5万+

猜你喜欢

转载自blog.csdn.net/xxradon/article/details/104100442