8.1.tensorRT advanced (3) package series - model compilation process encapsulation, simplifying model compilation code

Preface

I have read the tensorRT high-performance deployment course from scratch launched by Teacher Du before, but I didn’t take notes and I forgot many things. I’ll do it again this time and take notes.

In this course, you will learn tensorRT advanced-model compilation process encapsulation and simplify model compilation code.

Please see the mind map below for the course syllabus

Insert image description here

1. Model compilation process encapsulation

Let’s start learning the encapsulation of tensorRT

1. The encapsulation of tensorRT is more like the encapsulation of the inference engine

2. The significance of encapsulation is to standardize and tool the technology, making it more convenient and efficient to use, and to customize more default behaviors.

3. The idea of ​​encapsulating the inference engine can also be applied to more other places, embedded, etc. Since the default methods provided by most inference engines are not friendly enough, packaging them can make your code reusable, and one set of code can be used in multiple places.

4. The same encapsulation can also be implemented, and different inference backends can be switched through simple configuration, depending on the needs.

5. Our only purpose is to make work easier, make code more reusable, and allow technology to accumulate.

In this lesson we mainly learn to encapsulate the builder to make the compiled interface simple enough

Let's look at the code. First, it's about the encapsulation of the cuda gadget. The code is as follows:

cuda-tools.hpp

#ifndef CUDA_TOOLS_HPP
#define CUDA_TOOLS_HPP

#include <cuda_runtime.h>
#include <string>

#define checkRuntime(call) CUDATools::check_runtime(call, #call, __LINE__, __FILE__)

#define checkKernel(...)                                                                             \
    __VA_ARGS__;                                                                                     \
    do{
      
      cudaError_t cudaStatus = cudaPeekAtLastError();                                               \
    if (cudaStatus != cudaSuccess){
      
                                                                        \
        INFOE("launch failed: %s", cudaGetErrorString(cudaStatus));                                  \
    }} while(0);

namespace CUDATools{
    
    
    
    bool check_runtime(cudaError_t e, const char* call, int iLine, const char *szFile);
    bool check_device_id(int device_id);
    int current_device_id();
    std::string device_description();

    // 自动切换当前的deviceid,并在析构的时候切换回去
    class AutoDevice{
    
    
    public:
        AutoDevice(int device_id = 0);
        virtual ~AutoDevice();
    
    private:
        int old_ = -1;
    };
}


#endif // CUDA_TOOLS_HPP

cuda-tools.cpp


/*
 *  系统关于CUDA的功能函数
 */

#include "cuda-tools.hpp"
#include <stdio.h>
#include <stdarg.h>
#include <string>
#include <simple-logger.hpp>

using namespace std;

namespace CUDATools{
    
    

    bool check_runtime(cudaError_t e, const char* call, int line, const char *file){
    
    
        if (e != cudaSuccess) {
    
    
            INFOE("CUDA Runtime error %s # %s, code = %s [ %d ] in file %s:%d", 
                call, 
                cudaGetErrorString(e), 
                cudaGetErrorName(e), 
                e, file, line
            );
            return false;
        }
        return true;
    }

    bool check_device_id(int device_id){
    
    
        int device_count = -1;
        checkRuntime(cudaGetDeviceCount(&device_count));
        if(device_id < 0 || device_id >= device_count){
    
    
            INFOE("Invalid device id: %d, count = %d", device_id, device_count);
            return false;
        }
        return true;
    }

    static std::string format(const char* fmt, ...) {
    
    
        va_list vl;
        va_start(vl, fmt);
        char buffer[2048];
        vsnprintf(buffer, sizeof(buffer), fmt, vl);
        return buffer;
    }

    string device_description(){
    
    

        cudaDeviceProp prop;
        size_t free_mem, total_mem;
        int device_id = 0;

        checkRuntime(cudaGetDevice(&device_id));
        checkRuntime(cudaGetDeviceProperties(&prop, device_id));
        checkRuntime(cudaMemGetInfo(&free_mem, &total_mem));

        return format(
            "[ID %d]<%s>[arch %d.%d][GMEM %.2f GB/%.2f GB]",
            device_id, prop.name, prop.major, prop.minor, 
            free_mem / 1024.0f / 1024.0f / 1024.0f,
            total_mem / 1024.0f / 1024.0f / 1024.0f
        );
    }

    int current_device_id(){
    
    
        int device_id = 0;
        checkRuntime(cudaGetDevice(&device_id));
        return device_id;
    }

    AutoDevice::AutoDevice(int device_id){
    
    
        
        cudaGetDevice(&old_);
        checkRuntime(cudaSetDevice(device_id));
    }

    AutoDevice::~AutoDevice(){
    
    
        checkRuntime(cudaSetDevice(old_));
    }
}

The encapsulation of the CUDA toolset starts with two macro definitions:

checkRuntime(call) : This is a macro used to check CUDA runtime function calls. It logs the function's name, filename, and line number, and prints an error message when an error occurs.

checkKernel(…) : This macro is used to check the execution of CUDA kernel functions. It first executes the kernel function and then checks if there are any errors in its execution, if so it will output an error message.

Some functions and classes are implemented in the namespace CUDATools :

  • check_runtime : This function checks CUDA runtime errors and prints detailed error information, including file name, line number, error code and error string
  • check_device_id : This function checks whether the given device ID is valid, that is, whether it is within the valid range
  • current_device_id : This function returns the ID of the current CUDA device
  • AutoDevice class: This is a RAII style class used to automatically set up and restore CUDA devices. When you create an object of this class, it sets the specified device ID (default is 0) and restores the original device ID in its destructor

Let’s take a look at the encapsulation of the logger log. The code is as follows:

simple-logger.hpp

#ifndef SIMPLE_LOGGER_HPP
#define SIMPLE_LOGGER_HPP

#include <stdio.h>

#define INFOD(...)			SimpleLogger::__log_func(__FILE__, __LINE__, SimpleLogger::LogLevel::Debug, __VA_ARGS__)
#define INFOV(...)			SimpleLogger::__log_func(__FILE__, __LINE__, SimpleLogger::LogLevel::Verbose, __VA_ARGS__)
#define INFO(...)			SimpleLogger::__log_func(__FILE__, __LINE__, SimpleLogger::LogLevel::Info, __VA_ARGS__)
#define INFOW(...)			SimpleLogger::__log_func(__FILE__, __LINE__, SimpleLogger::LogLevel::Warning, __VA_ARGS__)
#define INFOE(...)			SimpleLogger::__log_func(__FILE__, __LINE__, SimpleLogger::LogLevel::Error, __VA_ARGS__)
#define INFOF(...)			SimpleLogger::__log_func(__FILE__, __LINE__, SimpleLogger::LogLevel::Fatal, __VA_ARGS__)


namespace SimpleLogger{
    
    

    enum class LogLevel : int{
    
    
        Debug   = 5,
        Verbose = 4,
        Info    = 3,
        Warning = 2,
        Error   = 1,
        Fatal   = 0
    };

    void set_log_level(LogLevel level);
    LogLevel get_log_level();
    void __log_func(const char* file, int line, LogLevel level, const char* fmt, ...);

};  // SimpleLogger

#endif // SIMPLE_LOGGER_HPP

simple-logger.cpp


#include "simple-logger.hpp"
#include <string>
#include <stdarg.h>

using namespace std;

namespace SimpleLogger{
    
    

    static LogLevel g_level = LogLevel::Info;

    const char* level_string(LogLevel level){
    
    
        switch (level){
    
    
            case LogLevel::Debug: return "debug";
            case LogLevel::Verbose: return "verbo";
            case LogLevel::Info: return "info";
            case LogLevel::Warning: return "warn";
            case LogLevel::Error: return "error";
            case LogLevel::Fatal: return "fatal";
            default: return "unknow";
        }
    }

    void set_log_level(LogLevel level){
    
    
        g_level = level;
    }

    LogLevel get_log_level(){
    
    
        return g_level;
    }

    string file_name(const string& path, bool include_suffix){
    
    

        if (path.empty()) return "";

        int p = path.rfind('/');
        p += 1;

        //include suffix
        if (include_suffix)
            return path.substr(p);

        int u = path.rfind('.');
        if (u == -1)
            return path.substr(p);

        if (u <= p) u = path.size();
        return path.substr(p, u - p);
    }

    string time_now(){
    
    
        char time_string[20];
        time_t timep;							
        time(&timep);							
        tm& t = *(tm*)localtime(&timep);

        sprintf(time_string, "%04d-%02d-%02d %02d:%02d:%02d", t.tm_year + 1900, t.tm_mon + 1, t.tm_mday, t.tm_hour, t.tm_min, t.tm_sec);
        return time_string;
    }

    void __log_func(const char* file, int line, LogLevel level, const char* fmt, ...){
    
    
        if(level > g_level) return;

        va_list vl;
        va_start(vl, fmt);
        
        char buffer[2048];
        auto now = time_now();
        string filename = file_name(file, true);
        int n = snprintf(buffer, sizeof(buffer), "[%s]", now.c_str());

        if (level == LogLevel::Fatal or level == LogLevel::Error) {
    
    
            n += snprintf(buffer + n, sizeof(buffer) - n, "[\033[31m%s\033[0m]", level_string(level));
        }
        else if (level == LogLevel::Warning) {
    
    
            n += snprintf(buffer + n, sizeof(buffer) - n, "[\033[33m%s\033[0m]", level_string(level));
        }
        else if (level == LogLevel::Info) {
    
    
            n += snprintf(buffer + n, sizeof(buffer) - n, "[\033[35m%s\033[0m]", level_string(level));
        }
        else if (level == LogLevel::Verbose) {
    
    
            n += snprintf(buffer + n, sizeof(buffer) - n, "[\033[34m%s\033[0m]", level_string(level));
        }
        else {
    
    
            n += snprintf(buffer + n, sizeof(buffer) - n, "[%s]", level_string(level));
        }

        n += snprintf(buffer + n, sizeof(buffer) - n, "[%s:%d]:", filename.c_str(), line);
        vsnprintf(buffer + n, sizeof(buffer) - n, fmt, vl);
        fprintf(stdout, "%s\n", buffer);

        if(level == LogLevel::Fatal || level == LogLevel::Error){
    
    
            fflush(stdout);
            abort();
        }
    }
};

The above is a simple log tool package. By defining different macros, it provides us with convenient logging functionality

  • By using some macros ( INFOD , INFOV , INFO , INFOW , INFOE , INFOF ), developers can easily add log messages in the code.
  • Each macro passes the current file name ( __FILE__ ), line number ( __LINE__ ), log level, and log message to the __log_func function.
  • __log_func is the function that performs the actual logging. It formats log messages using variadic list ( va_list ) and vsnprintf function, which adds colors to log messages based on log level

This logging tool provides a simple and effective way to add logging functionality to your application. Its design allows developers to easily add, modify and control log messages. It is easy for users to find bugs. For large-scale projects, basic components such as loggers must be available, otherwise they will have to worry about debugging when problems arise.

Let’s take a look at the encapsulation part of the core tensorRT model compilation. The code is as follows:

trt_builder.hpp



#ifndef TRT_BUILDER_HPP
#define TRT_BUILDER_HPP

#include <string>
#include <vector>
#include <functional>

namespace TRT {
    
    

	enum class Mode : int {
    
    
		FP32,
		FP16
	};

	const char* mode_string(Mode type);

	bool compile(
		Mode mode,
		unsigned int maxBatchSize,
		const std::string& source,
		const std::string& saveto,
		const size_t maxWorkspaceSize = 1ul << 30                // 1ul << 30 = 1GB
	);
};

#endif //TRT_BUILDER_HPP

trt_builder.cpp


#include "trt_builder.hpp"

#include <cuda_runtime_api.h>
#include <cublas_v2.h>
#include <NvInfer.h>
#include <NvInferPlugin.h>
//#include <NvCaffeParser.h>
#include <onnx-tensorrt/NvOnnxParser.h>
#include <string>
#include <vector>
#include <iostream>
#include <memory>
#include <sstream>
#include <assert.h>
#include <stdarg.h>
#include "cuda-tools.hpp"
#include "simple-logger.hpp"
#include <chrono>

using namespace nvinfer1;
using namespace std;   
//using namespace nvcaffeparser1  ;

class Logger : public ILogger {
    
    
public:
	virtual void log(Severity severity, const char* msg) noexcept override {
    
    

		if (severity == Severity::kINTERNAL_ERROR) {
    
    
			INFOE("NVInfer INTERNAL_ERROR: %s", msg);
			abort();
		}else if (severity == Severity::kERROR) {
    
    
			INFOE("NVInfer: %s", msg);
		}
		else  if (severity == Severity::kWARNING) {
    
    
			INFOW("NVInfer: %s", msg);
		}
		else  if (severity == Severity::kINFO) {
    
    
			INFOD("NVInfer: %s", msg);
		}
		else {
    
    
			INFOD("%s", msg);
		}
	}
};

static Logger gLogger;

namespace TRT {
    
    

	static string join_dims(const vector<int>& dims){
    
    
		stringstream output;
		char buf[64];
		const char* fmts[] = {
    
    "%d", " x %d"};
		for(int i = 0; i < dims.size(); ++i){
    
    
			snprintf(buf, sizeof(buf), fmts[i != 0], dims[i]);
			output << buf;
		}
		return output.str();
	}

	bool save_file(const string& file, const void* data, size_t length){
    
    

        FILE* f = fopen(file.c_str(), "wb");
        if (!f) return false;

        if (data && length > 0){
    
    
            if (fwrite(data, 1, length, f) not_eq length){
    
    
                fclose(f);
                return false;
            }
        }
        fclose(f);
        return true;
    }

	static string format(const char* fmt, ...) {
    
    
		va_list vl;
		va_start(vl, fmt);
		char buffer[10000];
		vsprintf(buffer, fmt, vl);
		return buffer;
	}

	static string dims_str(const nvinfer1::Dims& dims){
    
    
		return join_dims(vector<int>(dims.d, dims.d + dims.nbDims));
	}

	static const char* padding_mode_name(nvinfer1::PaddingMode mode){
    
    
		switch(mode){
    
    
			case nvinfer1::PaddingMode::kEXPLICIT_ROUND_DOWN: return "explicit round down";
			case nvinfer1::PaddingMode::kEXPLICIT_ROUND_UP: return "explicit round up";
			case nvinfer1::PaddingMode::kSAME_UPPER: return "same supper";
			case nvinfer1::PaddingMode::kSAME_LOWER: return "same lower";
			case nvinfer1::PaddingMode::kCAFFE_ROUND_DOWN: return "caffe round down";
			case nvinfer1::PaddingMode::kCAFFE_ROUND_UP: return "caffe round up";
		}
		return "Unknow padding mode";
	}

	static const char* pooling_type_name(nvinfer1::PoolingType type){
    
    
		switch(type){
    
    
			case nvinfer1::PoolingType::kMAX: return "MaxPooling";
			case nvinfer1::PoolingType::kAVERAGE: return "AveragePooling";
			case nvinfer1::PoolingType::kMAX_AVERAGE_BLEND: return "MaxAverageBlendPooling";
		}
		return "Unknow pooling type";
	}

	static const char* activation_type_name(nvinfer1::ActivationType activation_type){
    
    
		switch(activation_type){
    
    
			case nvinfer1::ActivationType::kRELU: return "ReLU";
			case nvinfer1::ActivationType::kSIGMOID: return "Sigmoid";
			case nvinfer1::ActivationType::kTANH: return "TanH";
			case nvinfer1::ActivationType::kLEAKY_RELU: return "LeakyRelu";
			case nvinfer1::ActivationType::kELU: return "Elu";
			case nvinfer1::ActivationType::kSELU: return "Selu";
			case nvinfer1::ActivationType::kSOFTSIGN: return "Softsign";
			case nvinfer1::ActivationType::kSOFTPLUS: return "Parametric softplus";
			case nvinfer1::ActivationType::kCLIP: return "Clip";
			case nvinfer1::ActivationType::kHARD_SIGMOID: return "Hard sigmoid";
			case nvinfer1::ActivationType::kSCALED_TANH: return "Scaled tanh";
			case nvinfer1::ActivationType::kTHRESHOLDED_RELU: return "Thresholded ReLU";
		}
		return "Unknow activation type";
	}

	static string layer_type_name(nvinfer1::ILayer* layer){
    
    
		switch(layer->getType()){
    
    
			case nvinfer1::LayerType::kCONVOLUTION: return "Convolution";
			case nvinfer1::LayerType::kFULLY_CONNECTED: return "Fully connected";
			case nvinfer1::LayerType::kACTIVATION: {
    
    
				nvinfer1::IActivationLayer* act = (nvinfer1::IActivationLayer*)layer;
				auto type = act->getActivationType();
				return activation_type_name(type);
			}
			case nvinfer1::LayerType::kPOOLING: {
    
    
				nvinfer1::IPoolingLayer* pool = (nvinfer1::IPoolingLayer*)layer;
				return pooling_type_name(pool->getPoolingType());
			}
			case nvinfer1::LayerType::kLRN: return "LRN";
			case nvinfer1::LayerType::kSCALE: return "Scale";
			case nvinfer1::LayerType::kSOFTMAX: return "SoftMax";
			case nvinfer1::LayerType::kDECONVOLUTION: return "Deconvolution";
			case nvinfer1::LayerType::kCONCATENATION: return "Concatenation";
			case nvinfer1::LayerType::kELEMENTWISE: return "Elementwise";
			case nvinfer1::LayerType::kPLUGIN: return "Plugin";
			case nvinfer1::LayerType::kUNARY: return "UnaryOp operation";
			case nvinfer1::LayerType::kPADDING: return "Padding";
			case nvinfer1::LayerType::kSHUFFLE: return "Shuffle";
			case nvinfer1::LayerType::kREDUCE: return "Reduce";
			case nvinfer1::LayerType::kTOPK: return "TopK";
			case nvinfer1::LayerType::kGATHER: return "Gather";
			case nvinfer1::LayerType::kMATRIX_MULTIPLY: return "Matrix multiply";
			case nvinfer1::LayerType::kRAGGED_SOFTMAX: return "Ragged softmax";
			case nvinfer1::LayerType::kCONSTANT: return "Constant";
			case nvinfer1::LayerType::kRNN_V2: return "RNNv2";
			case nvinfer1::LayerType::kIDENTITY: return "Identity";
			case nvinfer1::LayerType::kPLUGIN_V2: return "PluginV2";
			case nvinfer1::LayerType::kSLICE: return "Slice";
			case nvinfer1::LayerType::kSHAPE: return "Shape";
			case nvinfer1::LayerType::kPARAMETRIC_RELU: return "Parametric ReLU";
			case nvinfer1::LayerType::kRESIZE: return "Resize";
		}
		return "Unknow layer type";
	}

	static string layer_descript(nvinfer1::ILayer* layer){
    
    
		switch(layer->getType()){
    
    
			case nvinfer1::LayerType::kCONVOLUTION: {
    
    
				nvinfer1::IConvolutionLayer* conv = (nvinfer1::IConvolutionLayer*)layer;
				return format("channel: %d, kernel: %s, padding: %s, stride: %s, dilation: %s, group: %d", 
					conv->getNbOutputMaps(),
					dims_str(conv->getKernelSizeNd()).c_str(),
					dims_str(conv->getPaddingNd()).c_str(),
					dims_str(conv->getStrideNd()).c_str(),
					dims_str(conv->getDilationNd()).c_str(),
					conv->getNbGroups()
				);
			}
			case nvinfer1::LayerType::kFULLY_CONNECTED:{
    
    
				nvinfer1::IFullyConnectedLayer* fully = (nvinfer1::IFullyConnectedLayer*)layer;
				return format("output channels: %d", fully->getNbOutputChannels());
			}
			case nvinfer1::LayerType::kPOOLING: {
    
    
				nvinfer1::IPoolingLayer* pool = (nvinfer1::IPoolingLayer*)layer;
				return format(
					"window: %s, padding: %s",
					dims_str(pool->getWindowSizeNd()).c_str(),
					dims_str(pool->getPaddingNd()).c_str()
				);   
			}
			case nvinfer1::LayerType::kDECONVOLUTION:{
    
    
				nvinfer1::IDeconvolutionLayer* conv = (nvinfer1::IDeconvolutionLayer*)layer;
				return format("channel: %d, kernel: %s, padding: %s, stride: %s, group: %d", 
					conv->getNbOutputMaps(),
					dims_str(conv->getKernelSizeNd()).c_str(),
					dims_str(conv->getPaddingNd()).c_str(),
					dims_str(conv->getStrideNd()).c_str(),
					conv->getNbGroups()
				);
			}
			case nvinfer1::LayerType::kACTIVATION:
			case nvinfer1::LayerType::kPLUGIN:
			case nvinfer1::LayerType::kLRN:
			case nvinfer1::LayerType::kSCALE:
			case nvinfer1::LayerType::kSOFTMAX:
			case nvinfer1::LayerType::kCONCATENATION:
			case nvinfer1::LayerType::kELEMENTWISE:
			case nvinfer1::LayerType::kUNARY:
			case nvinfer1::LayerType::kPADDING:
			case nvinfer1::LayerType::kSHUFFLE:
			case nvinfer1::LayerType::kREDUCE:
			case nvinfer1::LayerType::kTOPK:
			case nvinfer1::LayerType::kGATHER:
			case nvinfer1::LayerType::kMATRIX_MULTIPLY:
			case nvinfer1::LayerType::kRAGGED_SOFTMAX:
			case nvinfer1::LayerType::kCONSTANT:
			case nvinfer1::LayerType::kRNN_V2:
			case nvinfer1::LayerType::kIDENTITY:
			case nvinfer1::LayerType::kPLUGIN_V2:
			case nvinfer1::LayerType::kSLICE:
			case nvinfer1::LayerType::kSHAPE:
			case nvinfer1::LayerType::kPARAMETRIC_RELU:
			case nvinfer1::LayerType::kRESIZE:
				return "";
		}
		return "Unknow layer type";
	}

	static bool layer_has_input_tensor(nvinfer1::ILayer* layer){
    
    
		int num_input = layer->getNbInputs();
		for(int i = 0; i < num_input; ++i){
    
    
			auto input = layer->getInput(i);
			if(input == nullptr)
				continue;

			if(input->isNetworkInput())
				return true;
		}
		return false;
	}

	static bool layer_has_output_tensor(nvinfer1::ILayer* layer){
    
    
		int num_output = layer->getNbOutputs();
		for(int i = 0; i < num_output; ++i){
    
    

			auto output = layer->getOutput(i);
			if(output == nullptr)
				continue;

			if(output->isNetworkOutput())
				return true;
		}
		return false;
	}  

	template<typename _T>
	shared_ptr<_T> make_nvshared(_T* ptr){
    
    
		return shared_ptr<_T>(ptr, [](_T* p){
    
    p->destroy();});
	}

	const char* mode_string(Mode type) {
    
    
		switch (type) {
    
    
		case Mode::FP32:
			return "FP32";
		case Mode::FP16:
			return "FP16";
		default:
			return "UnknowTRTMode";
		}
	}

	static nvinfer1::Dims convert_to_trt_dims(const std::vector<int>& dims){
    
    

		nvinfer1::Dims output{
    
    0};
		if(dims.size() > nvinfer1::Dims::MAX_DIMS){
    
    
			INFOE("convert failed, dims.size[%d] > MAX_DIMS[%d]", dims.size(), nvinfer1::Dims::MAX_DIMS);
			return output;
		}

		if(!dims.empty()){
    
    
			output.nbDims = dims.size();
			memcpy(output.d, dims.data(), dims.size() * sizeof(int));
		}
		return output;
	}

	static string align_blank(const string& input, int align_size, char blank = ' '){
    
    
        if(input.size() >= align_size) return input;
        string output = input;
        for(int i = 0; i < align_size - input.size(); ++i)
            output.push_back(blank);
        return output;
    }

	static long long timestamp_now() {
    
    
        return chrono::duration_cast<chrono::milliseconds>(chrono::system_clock::now().time_since_epoch()).count();
    }

    static double timestamp_now_float() {
    
    
        return chrono::duration_cast<chrono::microseconds>(chrono::system_clock::now().time_since_epoch()).count() / 1000.0;
    }


	bool compile(
		Mode mode,
		unsigned int maxBatchSize,
		const string& source,
		const string& saveto,
		const size_t maxWorkspaceSize) {
    
    

		INFO("Compile %s %s.", mode_string(mode), source.c_str());
		auto builder = make_nvshared(createInferBuilder(gLogger));
		if (builder == nullptr) {
    
    
			INFOE("Can not create builder.");
			return false;
		}

		auto config = make_nvshared(builder->createBuilderConfig());
		if (mode == Mode::FP16) {
    
    
			if (!builder->platformHasFastFp16()) {
    
    
				INFOW("Platform not have fast fp16 support");
			}
			config->setFlag(BuilderFlag::kFP16);
		}

		shared_ptr<INetworkDefinition> network;
		//shared_ptr<ICaffeParser> caffeParser;
		const auto explicitBatch = 1U << static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
		network = make_nvshared(builder->createNetworkV2(explicitBatch));
		
		shared_ptr<nvonnxparser::IParser> onnxParser = make_nvshared(nvonnxparser::createParser(*network, gLogger));
		if (onnxParser == nullptr) {
    
    
			INFOE("Can not create parser.");
			return false;
		}

		if (!onnxParser->parseFromFile(source.c_str(), 1)) {
    
    
			INFOE("Can not parse OnnX file: %s", source.c_str());
			return false;
		}

		auto inputTensor = network->getInput(0);
		auto inputDims = inputTensor->getDimensions();

		INFO("Input shape is %s", join_dims(vector<int>(inputDims.d, inputDims.d + inputDims.nbDims)).c_str());
		INFO("Set max batch size = %d", maxBatchSize);
		INFO("Set max workspace size = %.2f MB", maxWorkspaceSize / 1024.0f / 1024.0f);
		INFO("Base device: %s", CUDATools::device_description().c_str());

		int net_num_input = network->getNbInputs();
		INFO("Network has %d inputs:", net_num_input);
		vector<string> input_names(net_num_input);
		for(int i = 0; i < net_num_input; ++i){
    
    
			auto tensor = network->getInput(i);
			auto dims = tensor->getDimensions();
			auto dims_str = join_dims(vector<int>(dims.d, dims.d+dims.nbDims));
			INFO("      %d.[%s] shape is %s", i, tensor->getName(), dims_str.c_str());

			input_names[i] = tensor->getName();
		}

		int net_num_output = network->getNbOutputs();
		INFO("Network has %d outputs:", net_num_output);
		for(int i = 0; i < net_num_output; ++i){
    
    
			auto tensor = network->getOutput(i);
			auto dims = tensor->getDimensions();
			auto dims_str = join_dims(vector<int>(dims.d, dims.d+dims.nbDims));
			INFO("      %d.[%s] shape is %s", i, tensor->getName(), dims_str.c_str());
		}

		int net_num_layers = network->getNbLayers();
		INFO("Network has %d layers:", net_num_layers);
		for(int i = 0; i < net_num_layers; ++i){
    
    
			auto layer = network->getLayer(i);
			auto name = layer->getName();
			auto type_str = layer_type_name(layer);
			auto input0 = layer->getInput(0);
			if(input0 == nullptr) continue;
			
			auto output0 = layer->getOutput(0);
			auto input_dims = input0->getDimensions();
			auto output_dims = output0->getDimensions();
			bool has_input = layer_has_input_tensor(layer);
			bool has_output = layer_has_output_tensor(layer);
			auto descript = layer_descript(layer);
			type_str = align_blank(type_str, 18);
			auto input_dims_str = align_blank(dims_str(input_dims), 18);
			auto output_dims_str = align_blank(dims_str(output_dims), 18);
			auto number_str = align_blank(format("%d.", i), 4);

			const char* token = "      ";
			if(has_input)
				token = "  >>> ";
			else if(has_output)
				token = "  *** ";

			INFOV("%s%s%s %s-> %s%s", token, 
				number_str.c_str(), 
				type_str.c_str(),
				input_dims_str.c_str(),
				output_dims_str.c_str(),
				descript.c_str()
			);
		}
		
		builder->setMaxBatchSize(maxBatchSize);
		config->setMaxWorkspaceSize(maxWorkspaceSize);

		auto profile = builder->createOptimizationProfile();
		for(int i = 0; i < net_num_input; ++i){
    
    
			auto input = network->getInput(i);
			auto input_dims = input->getDimensions();
			input_dims.d[0] = 1;
			profile->setDimensions(input->getName(), nvinfer1::OptProfileSelector::kMIN, input_dims);
			profile->setDimensions(input->getName(), nvinfer1::OptProfileSelector::kOPT, input_dims);
			input_dims.d[0] = maxBatchSize;
			profile->setDimensions(input->getName(), nvinfer1::OptProfileSelector::kMAX, input_dims);
		}

		// not need
		// for(int i = 0; i < net_num_output; ++i){
    
    
		// 	auto output = network->getOutput(i);
		// 	auto output_dims = output->getDimensions();
		// 	output_dims.d[0] = 1;
		// 	profile->setDimensions(output->getName(), nvinfer1::OptProfileSelector::kMIN, output_dims);
		// 	profile->setDimensions(output->getName(), nvinfer1::OptProfileSelector::kOPT, output_dims);
		// 	output_dims.d[0] = maxBatchSize;
		// 	profile->setDimensions(output->getName(), nvinfer1::OptProfileSelector::kMAX, output_dims);
		// }
		config->addOptimizationProfile(profile);

		// error on jetson
		// auto timing_cache = shared_ptr<nvinfer1::ITimingCache>(config->createTimingCache(nullptr, 0), [](nvinfer1::ITimingCache* ptr){ptr->reset();});
		// config->setTimingCache(*timing_cache, false);
		// config->setFlag(BuilderFlag::kGPU_FALLBACK);
		// config->setDefaultDeviceType(DeviceType::kDLA);
		// config->setDLACore(0);

		INFO("Building engine...");
		auto time_start = timestamp_now();
		auto engine = make_nvshared(builder->buildEngineWithConfig(*network, *config));
		if (engine == nullptr) {
    
    
			INFOE("engine is nullptr");
			return false;
		}
		INFO("Build done %lld ms !", timestamp_now() - time_start);
		
		// serialize the engine, then close everything down
		auto seridata = make_nvshared(engine->serialize());
		return save_file(saveto, seridata->data(), seridata->size());
	}
}; //namespace TRTBuilder

The model compilation and packaging content can be mainly divided into the following parts: ( from chatGPT )

1. Log processing :

  • A custom Logger class, which inherits from ILogger , is used to process TensorRT’s log messages.
  • Depending on the severity of the message, it will print different types of log messages.

2. Practical functions :

  • join_dims : Format the dimensions of the tensor as strings.
  • save_file : Save the given data to a file.
  • format : A simple string formatting function.
  • dims_str : Convert nvinfer1::Dims object to string representation.

3. compile compile function

4. Other functions : The file also contains some other auxiliary functions and tools, such as timestamp acquisition, CUDA error handling, etc.

5. Error handling : During the entire compilation process, possible errors are checked, such as parsing errors, file writing errors, etc., and corresponding error messages are returned when errors occur.

The above package provides a complete flow from the ONNX model to the optimized TensorRT inference engine. Its main purpose is to simplify the use of TensorRT, so that users only need to call a function to complete the import, optimization and serialization of the model.

Let's focus on the content in the compile function, which mainly includes the following parts:

1. TensorRT builder initialization

The function first creates a TensorRT builder instance and sets the log for it. This builder is the main component used to create inference engines in the TensorRT framework.

auto builder = make_nvshared(nvinfer1::createInferBuilder(gLogger));

2.Configuration settings

Set the maximum batch size and maximum workspace size for TensorRT builders. These parameters are critical for controlling the resource usage and performance of the inference engine.

builder->setMaxBatchSize(maxBatchSize);
builder->setMaxWorkspaceSize(maxWorkspaceSize);

3. Precision mode

Sets the builder's precision mode according to the mode passed in (FP16 or FP32). If FP16 mode is selected, the builder uses half-precision floating point numbers for calculations, which may improve performance at the cost of some precision.

if (mode == Mode::FP16) {
    
    
    builder->setFp16Mode(true);
}

4. Import ONNX model

Import the model using TensorRT's ONNX parser. This reads the ONNX model from a file and parses it into a format that TensorRT can understand.

auto parser = make_nvshared(nvonnxparser::createParser(*network, gLogger));
if (!parser->parseFromFile(source.c_str(), static_cast<int>(ILogger::Severity::kWARNING))) {
    
    
    return false;
}

5. Optimize configuration files

Create an optimization profile for the input to the network. These configuration files define different sizes of input, allowing TensorRT to optimize the engine for inputs of different sizes.

auto profile = builder->createOptimizationProfile();
for (int i = 0; i < net_num_input; ++i) {
    
    
    auto input = network->getInput(i);
    auto input_dims = input->getDimensions();
    input_dims.d[0] = 1;
    profile->setDimensions(input->getName(), nvinfer1::OptProfileSelector::kMIN, input_dims);
    profile->setDimensions(input->getName(), nvinfer1::OptProfileSelector::kOPT, input_dims);
    input_dims.d[0] = maxBatchSize;
    profile->setDimensions(input->getName(), nvinfer1::OptProfileSelector::kMAX, input_dims);
}
config->addOptimizationProfile(profile);

6. Build the engine

Build the TensorRT engine using the previous configuration and network information. This is a time-consuming process because TensorRT tries multiple optimization techniques to improve the model's inference speed.

auto engine = make_nvshared(builder->buildEngineWithConfig(*network, *config));

7. Serialization engine

Finally, the built engine is serialized so that it can be reloaded without the original model and configuration. The serialized engine is saved to a file.

auto seridata = make_nvshared(engine->serialize());
return save_file(saveto, seridata->data(), seridata->size());

The encapsulated code compiled by this model covers the entire process from the original ONNX model to the optimized TensorRT inference engine. It simplifies the compilation process of the TensorRT model, so that users only need to call a function to complete the compilation of the model.

Without further ado, let's execute it first. The execution effect is as follows:

Insert image description here

Figure 1-1 Builder package case

You can see that it will print a time, then the log level, then the file name and line number. This is the role of logger.

During the compilation process, we can also obtain its detailed information, including detailed information about the network structure, as shown below:

Insert image description here

Figure 1-2 View detailed network structure information

The above is about the encapsulation of the builder. Although there are many waste codes that do not care about performance improvement, it is beneficial for us to debug.

2. Q&A session

The following content comes from the conversation between the blogger and chatGPT.

Question 1 : I am more curious that variables like call #call __LINE__ __FILE__ in the macro definition checkRuntime are not defined, how is the code identified?

Answer : These are features of the C/C++ preprocessor. In a macro definition, you can use some special macro operators to achieve additional functions. Let us explain these elements used in checkRuntime macro in detail:

1. call:

  • In a macro definition, call is a macro parameter. When you use this macro and pass in an argument, call is replaced with the value of that argument. For example, if you write checkRuntime(cudaMalloc(…)); then in macro expansion all calls will be replaced with cudaMalloc(…) .

2. #call:

  • # is a macro operator, known as the "stringification" operator. When it appears in a macro definition and precedes a macro argument, it converts the argument to a string literal. Therefore, #call will convert the call argument to a string. For example, if call is cudaMalloc(...) then #call is "cudaMalloc(...)" .

3. __LINE__ and __FILE__ :

  • These are predefined macros. They do not need to be defined manually.
  • __LINE__ will be replaced with the current source code line number.
  • __FILE__ will be replaced with the current source filename.

So, when you write checkRuntime(cudaMalloc(…)); in your code , the preprocessor will expand it to:

CUDATools::check_runtime(cudaMalloc(...), "cudaMalloc(...)", 当前行号, "当前文件名");

In this way, the specific function called, line number and file name can be obtained in the check_runtime function, making the error report more detailed and useful.

Question 2 : In checkKernel, what do the two variables... and __VA_ARGS__ mean?

Answer : In C++ macros, ... and __VA_ARGS__ are used to handle variable number of arguments. This is part of the Variadic Macros introduced in C++11.

1.

  • In the parameter list of a macro definition, ... is used to indicate that the macro can accept a variable number of parameters.
  • Using ... allows you to provide any number of arguments to the macro .

2. __VA_ARGS__ :

  • In the substitution section of a macro, you can use __VA_ARGS__ to reference the variadic arguments passed to the macro.
  • __VA_ARGS__ will be replaced with all arguments passed to the macro (except those already explicitly named).

... allows checkKernel to accept any number of arguments. The replacement part of the macro then uses __VA_ARGS__ to perform these arguments. This allows you to place CUDA kernel function calls directly before the checkKernel macro.

So when you write the following code:

checkKernel(myKernel<<<blocks, threads>>>(arg1, arg2));

The macro will be expanded to:

myKernel<<<blocks, threads>>>(arg1, arg2);
do {
     
     
   cudaError_t cudaStatus = cudaPeekAtLastError();
   if (cudaStatus != cudaSuccess){
     
     
       INFOE("launch failed: %s", cudaGetErrorString(cudaStatus));
   }
} while(0);

Thus, ... and __VA_ARGS__ allow you to provide a variable number of arguments to a macro and use those arguments in the macro's definition. This gives macros more flexibility and allows them to handle a variety of different situations.

Summarize

In this course, we learned about builder encapsulation. The model compilation process of tensorRT is actually common in many places, and we can completely encapsulate it. First, we encapsulated a cuda-tools gadget, which contains definitions for checking whether the cuda runtime function and the cuda kernel kernel function are executed successfully, and also contains some device information gadgets; then we encapsulated a logger log file. This log The file can output the corresponding time, log level, file name, line number, etc., which is very conducive to our debugging work; finally, our core, regarding the encapsulation of the builder, we focus on the implementation of the compile function, which is to combine our usual The tensorRT model compilation process is encapsulated, in which we output a lot of information through the logger log tool, and can even print out detailed information about the network structure, which is very practical.

Guess you like

Origin blog.csdn.net/qq_40672115/article/details/132266114