5.2. tensorRT basics (2) - use onnx parser to read onnx files (source code compilation)

foreword

Teacher Du launched the tensorRT high-performance deployment course from scratch . I have read it before, but I didn’t take notes, and I forgot many things. This time I will do it again, and take notes by the way.

This course learns the basics of tensorRT - using the onnx parser to read onnx files (source code compilation)

The course outline can be seen in the mind map below

insert image description here

1. ONNX parser

In this lesson we will learn about the onnx parser

There are two options for onnx parser, libnvonnxparser.so or https://github.com/onnx/onnx-tensorrt (source code). The purpose of using the source code is to better customize the package, simplify the process of plug-in development or model compilation, be more customized, and debug when encountering problems.

After the source code is compiled, it is actually a .so file. If there is a problem with libnvonnxparser.so, you will not be able to debug it. The biggest advantage of using the source code is to facilitate debugging, find problems, and analyze the context

Let's compare the two repo written by Mr. Du

The infer repo parses the onnx model by calling the libonnxparser.so library file. This repo is relatively simple and less difficult to get started.

The tensorRT_Pro repo is to compile and modify the source code to parse the onnx model. This repo is relatively difficult, but it is more customized, and it is more convenient to write plug-ins

2. libnvonnxparser.so

Let's first demonstrate how libnvonnxparser.so parses the onnx model, so as to complete the construction of the model

First use gen-onnx.py to export a simple onnx model for easy demonstration. The code is as follows:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.onnx
import os

class Model(torch.nn.Module):
    def __init__(self):
        super().__init__()

        self.conv = nn.Conv2d(1, 1, 3, padding=1)
        self.relu = nn.ReLU()
        self.conv.weight.data.fill_(1)
        self.conv.bias.data.fill_(0)
    
    def forward(self, x):
        x = self.conv(x)
        x = self.relu(x)
        return x

# 这个包对应opset11的导出代码，如果想修改导出的细节，可以在这里修改代码
# import torch.onnx.symbolic_opset11
print("对应opset文件夹代码在这里：", os.path.dirname(torch.onnx.__file__))

model = Model()
dummy = torch.zeros(1, 1, 3, 3)
torch.onnx.export(
    model, 

    # 这里的args，是指输入给model的参数，需要传递tuple，因此用括号
    (dummy,), 

    # 储存的文件路径
    "workspace/demo.onnx", 

    # 打印详细信息
    verbose=True, 

    # 为输入和输出节点指定名称，方便后面查看或者操作
    input_names=["image"], 
    output_names=["output"], 

    # 这里的opset，指，各类算子以何种方式导出，对应于symbolic_opset11
    opset_version=11, 

    # 表示他有batch、height、width3个维度是动态的，在onnx中给其赋值为-1
    dynamic_axes={
    
    
        "image": {
    
    0: "batch", 2: "height", 3: "width"},
        "output": {
    
    0: "batch", 2: "height", 3: "width"},
    }
)

print("Done.!")

The exported onnx model is as follows:

insert image description here

Figure 2-1 Simple onnx model

The next step is to use onnxparser to parse the onnx model. Before that, you need to include the libonnxparser.so library file in the Makefile. The content of main.cpp is as follows:


// tensorRT include
// 编译用的头文件
#include <NvInfer.h>

// onnx解析器的头文件
#include <NvOnnxParser.h>

// 推理用的运行时头文件
#include <NvInferRuntime.h>

// cuda include
#include <cuda_runtime.h>

// system include
#include <stdio.h>
#include <math.h>

#include <iostream>
#include <fstream>
#include <vector>

using namespace std;

inline const char* severity_string(nvinfer1::ILogger::Severity t){
    
    
    switch(t){
    
    
        case nvinfer1::ILogger::Severity::kINTERNAL_ERROR: return "internal_error";
        case nvinfer1::ILogger::Severity::kERROR:   return "error";
        case nvinfer1::ILogger::Severity::kWARNING: return "warning";
        case nvinfer1::ILogger::Severity::kINFO:    return "info";
        case nvinfer1::ILogger::Severity::kVERBOSE: return "verbose";
        default: return "unknow";
    }
}

class TRTLogger : public nvinfer1::ILogger{
    
    
public:
    virtual void log(Severity severity, nvinfer1::AsciiChar const* msg) noexcept override{
    
    
        if(severity <= Severity::kINFO){
    
    
            // 打印带颜色的字符，格式如下：
            // printf("\033[47;33m打印的文本\033[0m");
            // 其中 \033[ 是起始标记
            //      47    是背景颜色
            //      ;     分隔符
            //      33    文字颜色
            //      m     开始标记结束
            //      \033[0m 是终止标记
            // 其中背景颜色或者文字颜色可不写
            // 部分颜色代码 https://blog.csdn.net/ericbar/article/details/79652086
            if(severity == Severity::kWARNING){
    
    
                printf("\033[33m%s: %s\033[0m\n", severity_string(severity), msg);
            }
            else if(severity <= Severity::kERROR){
    
    
                printf("\033[31m%s: %s\033[0m\n", severity_string(severity), msg);
            }
            else{
    
    
                printf("%s: %s\n", severity_string(severity), msg);
            }
        }
    }
} logger;

// 上一节的代码
bool build_model(){
    
    
    TRTLogger logger;

    // ----------------------------- 1. 定义 builder, config 和network -----------------------------
    nvinfer1::IBuilder* builder = nvinfer1::createInferBuilder(logger);
    nvinfer1::IBuilderConfig* config = builder->createBuilderConfig();
    nvinfer1::INetworkDefinition* network = builder->createNetworkV2(1);


    // ----------------------------- 2. 输入，模型结构和输出的基本信息 -----------------------------
    // 通过onnxparser解析的结果会填充到network中，类似addConv的方式添加进去
    nvonnxparser::IParser* parser = nvonnxparser::createParser(*network, logger);
    if(!parser->parseFromFile("demo.onnx", 1)){
    
    
        printf("Failed to parser demo.onnx\n");

        // 注意这里的几个指针还没有释放，是有内存泄漏的，后面考虑更优雅的解决
        return false;
    }
    
    int maxBatchSize = 10;
    printf("Workspace Size = %.2f MB\n", (1 << 28) / 1024.0f / 1024.0f);
    config->setMaxWorkspaceSize(1 << 28);

    // --------------------------------- 2.1 关于profile ----------------------------------
    // 如果模型有多个输入，则必须多个profile
    auto profile = builder->createOptimizationProfile();
    auto input_tensor = network->getInput(0);
    int input_channel = input_tensor->getDimensions().d[1];
    
    // 配置输入的最小、最优、最大的范围
    profile->setDimensions(input_tensor->getName(), nvinfer1::OptProfileSelector::kMIN, nvinfer1::Dims4(1, input_channel, 3, 3));
    profile->setDimensions(input_tensor->getName(), nvinfer1::OptProfileSelector::kOPT, nvinfer1::Dims4(1, input_channel, 3, 3));
    profile->setDimensions(input_tensor->getName(), nvinfer1::OptProfileSelector::kMAX, nvinfer1::Dims4(maxBatchSize, input_channel, 5, 5));
    // 添加到配置
    config->addOptimizationProfile(profile);

    nvinfer1::ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
    if(engine == nullptr){
    
    
        printf("Build engine failed.\n");
        return false;
    }

    // -------------------------- 3. 序列化 ----------------------------------
    // 将模型序列化，并储存为文件
    nvinfer1::IHostMemory* model_data = engine->serialize();
    FILE* f = fopen("engine.trtmodel", "wb");
    fwrite(model_data->data(), 1, model_data->size(), f);
    fclose(f);

    // 卸载顺序按照构建顺序倒序
    model_data->destroy();
    parser->destroy();
    engine->destroy();
    network->destroy();
    config->destroy();
    builder->destroy();
    printf("Done.\n");
    return true;
}

int main(){
    
    
    build_model();
    return 0;
}

This is similar to the process of compiling the model we built before, but it is compiled using the liboonnxparser.so parser.

You need to include the header file of the onnx parser #include <NvOnnxParser.h>. In addition, the construction of the network is no longer completed using the C++ API, but parsed using onnxparser, as shown in the following figure:

insert image description here

Figure 2-2 Differences in onnx network construction

Of course, you also need to include the library file libonnxparser.so in the Makefile

insert image description here

Figure 2-3 Differences in Makefiles

The running effect of the case is as follows:

insert image description here

Figure 2-4 The running effect of libonnxparser.so case

After the compilation is completed, the engine.trtmodel will be generated under the workspace folder, which is compiled and generated by parsing the onnx model file. Compared with the previous layer-by-layer construction through the C++ API, it is really easy, but you will find that its bottom layer is still going to Call the C++ API to build the network.

3. Source code compilation

Let's learn how to parse the onnx model with source code

In this case, we also provide gen-onnx.py to generate a simple onnx. You can find that there are four files in the src/onnx directory, as shown in the figure below. These four files are generated by the proto file. The specific generation can be Refer to the onnx/make_pb.sh file

In fact, it is generated by compiling two protoc files through the protobuf compiler protoc we mentioned in the last lesson. The onnx parser relies on these 4 files to complete the onnx analysis, so this is the basis

Didn’t we mention in the last lesson that the essence of onnx is a protobuf file, so how to describe this file, mainly through the two protobuf files onnx-ml.proto and onnx-operators-ml.proto to describe onnx, And we actually want to use other types of languages such as Python and C++ to describe and interpret onnx files, so we need the protoc compiler and the two protobuf files onnx-ml.proto and onnx-operators-ml.proto to generate Corresponding Python or C++, the specific conversion process is also mentioned in the previous course.

insert image description here

Figure 3-1 Case directory structure

onnx-tensorrt-release-8.0 is the source code downloaded from https://github.com/onnx/onnx-tensorrt . Some unnecessary files have been deleted, and the content has not been modified. You can see that there is also one in the source code NvONNXParser.h.

Next, let's take a look at the difference in main.cpp. We can find that the header file has been modified, and the header file in the source code is used, as shown in the figure below. At the same time, the corresponding libnvonnxparser.so file is also deleted from the Makefile file. Other same as last case

insert image description here

Figure 3-2 main.cpp difference

The running effect is as follows:

insert image description here

Figure 3-3 Running effect of the source code example

4. Supplementary knowledge

So far we have demonstrated the use of so and source code to parse the onnx file. It is not enough for us to get the source code to know how to parse it. We also need to understand how to use it and how to modify it

Although there are a lot of source codes, it seems very complicated, but we can pay attention builtin_op_importers.cppto it most of the time. All the operators supported by tensorRT will appear in this file, so it is very necessary for us to interpret this file.

We added a print statement to the Conv operator. From the running effect in Figure 3-3, it can be seen that the print statement is printed normally, indicating that the modification is carried out as scheduled.

DEFINE_BUILTIN_OP_IMPORTER(Conv)It seems a bit strange, in fact, it is written with a macro definition. Correspondingly importConv( IImporterContext* ctx, ::onnx::NodeProto const& node, std::vector<TensorOrWeights>& inputs), it has a context and a node as input. The input x of Conv is Tensor, and the weight of Conv is not defined as Tensor but defined as Weights, because it is something from the Initializer, is so distinguished

DEFINE_BUILTIN_OP_IMPORTER(Conv)
{
    
    
    printf("src/onnx-tensorrt-release-8.0/builtin_op_importers.cpp:521 ===卷积算子会执行这里的代码进行构建==================\n");
    if (inputs.at(1).is_tensor())
    {
    
    
        if (inputs.size() == 3)
        {
    
    
            ASSERT(
                inputs.at(2).is_weights() && "The bias tensor is required to be an initializer for the Conv operator",
                ErrorCode::kUNSUPPORTED_NODE);
        }
        // Handle Multi-input convolution
        return convDeconvMultiInput(ctx, node, inputs, true /*isConv*/);
    }

    nvinfer1::ITensor* tensorPtr = &convertToTensor(inputs.at(0), ctx);

    auto kernelWeights = inputs.at(1).weights();

    nvinfer1::Dims dims = tensorPtr->getDimensions();
    LOG_VERBOSE("Convolution input dimensions: " << dims);
    ASSERT(dims.nbDims >= 0 && "TensorRT could not compute output dimensions of Conv", ErrorCode::kUNSUPPORTED_NODE);

    const bool needToExpandDims = (dims.nbDims == 3);
    if (needToExpandDims)
    {
    
    
        // Expand spatial dims from 1D to 2D
        std::vector<int> axes{
    
    3};
        tensorPtr = unsqueezeTensor(ctx, node, *tensorPtr, axes);
        ASSERT(tensorPtr && "Failed to unsqueeze tensor.", ErrorCode::kUNSUPPORTED_NODE);
        dims = tensorPtr->getDimensions();
    }
    if (kernelWeights.shape.nbDims == 3)
    {
    
    
        kernelWeights.shape.nbDims = 4;
        kernelWeights.shape.d[3] = 1;
    }

    const int nbSpatialDims = dims.nbDims - 2;
    // Check that the number of spatial dimensions and the kernel shape matches up.
    ASSERT( (nbSpatialDims == kernelWeights.shape.nbDims - 2) && "The number of spatial dimensions and the kernel shape doesn't match up for the Conv operator.", ErrorCode::kUNSUPPORTED_NODE);

    nvinfer1::Weights bias_weights;
    if (inputs.size() == 3)
    {
    
    
        ASSERT(inputs.at(2).is_weights() && "The bias tensor is required to be an initializer for the Conv operator.", ErrorCode::kUNSUPPORTED_NODE);
        auto shapedBiasWeights = inputs.at(2).weights();
        // Unsqueeze scalar weights to 1D
        if (shapedBiasWeights.shape.nbDims == 0)
        {
    
    
            shapedBiasWeights.shape = {
    
    1, {
    
    1}};
        }
        ASSERT( (shapedBiasWeights.shape.nbDims == 1) && "The bias tensor is required to be 1D.", ErrorCode::kINVALID_NODE);
        ASSERT( (shapedBiasWeights.shape.d[0] == kernelWeights.shape.d[0]) && "The shape of the bias tensor misaligns with the weight tensor.", ErrorCode::kINVALID_NODE);
        bias_weights = shapedBiasWeights;
    }
    else
    {
    
    
        bias_weights = ShapedWeights::empty(kernelWeights.type);
    }
    nvinfer1::Dims kernelSize;
    kernelSize.nbDims = nbSpatialDims;
    for (int i = 1; i <= nbSpatialDims; ++i)
    {
    
    
        kernelSize.d[nbSpatialDims - i] = kernelWeights.shape.d[kernelWeights.shape.nbDims - i];
    }
    nvinfer1::Dims strides = makeDims(nbSpatialDims, 1);
    nvinfer1::Dims begPadding = makeDims(nbSpatialDims, 0);
    nvinfer1::Dims endPadding = makeDims(nbSpatialDims, 0);
    nvinfer1::Dims dilations = makeDims(nbSpatialDims, 1);
    nvinfer1::PaddingMode paddingMode;
    bool exclude_padding;
    getKernelParams(
        ctx, node, &kernelSize, &strides, &begPadding, &endPadding, paddingMode, exclude_padding, &dilations);

    for (int i = 1; i <= nbSpatialDims; ++i)
    {
    
    
        ASSERT( (kernelSize.d[nbSpatialDims - i] == kernelWeights.shape.d[kernelWeights.shape.nbDims - i])
            && "The size of spatial dimension and the size of kernel shape are not equal for the Conv operator.",
            ErrorCode::kUNSUPPORTED_NODE);
    }

    int nchan = dims.d[1];
    int noutput = kernelWeights.shape.d[0];
    nvinfer1::IConvolutionLayer* layer
        = ctx->network()->addConvolutionNd(*tensorPtr, noutput, kernelSize, kernelWeights, bias_weights);

    ASSERT(layer && "Failed to add a convolution layer.", ErrorCode::kUNSUPPORTED_NODE);
    layer->setStrideNd(strides);
    layer->setPaddingMode(paddingMode);
    layer->setPrePadding(begPadding);
    layer->setPostPadding(endPadding);
    layer->setDilationNd(dilations);
    OnnxAttrs attrs(node, ctx);
    int ngroup = attrs.get("group", 1);
    ASSERT( (nchan == -1 || kernelWeights.shape.d[1] * ngroup == nchan) && "Kernel weight dimension failed to broadcast to input.", ErrorCode::kINVALID_NODE);
    layer->setNbGroups(ngroup);
    // Register layer name as well as kernel weights and bias weights (if any)
    ctx->registerLayer(layer, getNodeName(node));
    ctx->network()->setWeightsName(kernelWeights, inputs.at(1).weights().getName());
    if (inputs.size() == 3)
    {
    
    
        ctx->network()->setWeightsName(bias_weights, inputs.at(2).weights().getName());
    }
    tensorPtr = layer->getOutput(0);
    dims = tensorPtr->getDimensions();

    if (needToExpandDims)
    {
    
    
        // Un-expand spatial dims back to 1D
        std::vector<int> axes{
    
    3};
        tensorPtr = squeezeTensor(ctx, node, *tensorPtr, axes);
        ASSERT(tensorPtr && "Failed to unsqueeze tensor.", ErrorCode::kUNSUPPORTED_NODE);
    }

    LOG_VERBOSE("Using kernel: " << kernelSize << ", strides: " << strides << ", prepadding: " << begPadding
        << ", postpadding: " << endPadding << ", dilations: " << dilations << ", numOutputs: " << noutput);
    LOG_VERBOSE("Convolution output dimensions: " << dims);
    return {
    
    {
    
    tensorPtr}};
}

We can simply interpret this code. First, it will judge whether your first input is a tensor. You can see from the onnx model that the first input of Conv is X, which is images, followed by W and B, as shown in the figure below Show

insert image description here

Figure 4-1 The input of conv in the onnx model

Since the index starts from 0, the number 1 is weight. It is mentioned above that it is interpreted as weights instead of tensor in onnx, so this line is not true, go down; next, the No. 0 input of Conv will be converted into Tensor is to onnx2trt::Tensorconvert to nvinfer1::ITensor, followed by the calculation of various dimensions, and the final execution ctx->network()->addConvolutionNd(*tensorPtr, noutput, kernelSize, kernelWeights, bias_weights)is exactly the same as our manual addition method, and then manually set padding, stride, etc., and the final output tensorPtris the output of the layer.

So the whole onnx parser is essentially calling the C++ API to form a network structure. If there is an operator you don’t know, you can add an explanation to it in the source code and change it into one that you think is ok. way, and then added to tensorRT. Whether it’s a plug-in or anything else, it’s essentially doing this, so you only need to pay attention to builtin_op_importers.cppthis file, and you basically don’t pay attention to other files or rarely pay attention to them.

Summarize

In this course, we learned how to use the onnx parser to build a model, mainly including libnvonnxparser.so library file and source code. The library file is easy to use, but it cannot be debugged. Although the source code looks complicated, it can achieve more Customized operations can also debug and analyze the context. The library files and source code also correspond to the two repos of infer and tensorRT_Pro . In the next lesson, we will take you from downloading onnx-tensorrt to compiling and running from scratch.