TensorRT Plugin notes 2

TensorRT Plugin details

I reviewed TensorRT-Plugin again this week, and this time I focused on the code details. The following are the notes I made by integrating several good materials and sharing them.
A brief introduction to TensorRT:

1) The closed source part is the official library, which is the core part of TRT;

2) The open source part is on github, including Parser (caffe, onnx), Sample and some plugins.
insert image description here
Need to write 2 classes:

1) MyCustomPlugin, which inherits IPluginV2Ext/IPluginV2IOExt/IPluginV2DynamicExt, is a plug-in class and is used to write the specific implementation of plug-ins;

2) MyCustomPluginCreator, inherited from BaseCreator, is a plug-in factory class used to create the plug-in according to requirements.

class MyCustomPlugin final : public nvinfer1::IPluginV2DynamicExt
class MyCustomPluginCreator : public BaseCreator

Notice:

1) To write a plugin, you need to inherit the base class of TRT (the figure below is the base class feature);

insert image description here

2) For Static Shape, use IPluginV2IOExt; for Dynamic Shape, use IPluginV2DynamicExt.

Static Shape Plugin API

MyCustomPlugin(int in_channel, nvinfer1::Weights const& weight, nvinfer1::Weights const& bias);  // 构造函数,用于网络定义阶段

MyCustomPlugin(void const* serialData, size_t serialLength);    // 构造函数,用于反序列化阶段

int getNbOutputs() const;      // 获得layer的输出个数

nvinfer1::Dims getOutputDimensions(int index, const nvinfer1::Dims* inputs, int nbInputDims);    // 获得layer的输出维度

nvinfer1::DataType getOutputDataType(int index, const nvinfer1::DataType* inputTypes, int nbInputs) const;     // 获得输出数据类型

size_t getSerializationSize() const;      //返回序列化时需要写多少字节到buffer中

void serialize(void* buffer) const;      //序列化函数,将plugin的参数权值写入到buffer中

const char* getPluginType() const;       // 获得plugin的type,用于反序列化使用

const char* getPluginVersion() const;       //获得plugin的version,用于反序列化使用

int initialize();          // 初始化函数,在这个插件准备开始run之前执行。一般申请权值显存空间并copy权值

void terminate();         // terminate函数就是释放initialize开辟的一些显存空间

void destroy();           // 释放整个plugin占用的资源

void configurePlugin(const nvinfer1::PluginTensorDesc* in, int nbInput, const nvinfer1::PluginTensorDesc* out, int nbOutput);          // 判断输入是否符合标准

bool supportsFormatCombination(int pos, const nvinfer1::PluginTensorDesc* inOut, int nbInputs, int nbOutputs) const;          // 判断输入、输出的格式

size_t getworkspaceSize(int maxBatchSize) const;        // 获得plugin所需要的显存大小

int enqueue(int batchSize, const void* const* inputs, void** outputs, void* * workspace, cudaStream_t stream);   // 推理函数

const char* setPluginNamespace() const;            // 为这个插件设置namespace名字,每个plugin定义1个专属的Namespace,如果不设置则默认是"",需要注意的是同一个namespace下的plugin如果名字相同会产生冲突
const char* getPluginNamespace() const;   // 获取plugin的命名空间
const PluginFieldCollection *GridAnchorBasePluginCreator::getFieldNames();     // PluginFieldCollection的主要作用是传递插件op所需要的权重和参数
void attachToContext(cudnnContext* cudnnContext, cublasContext* cublasContext, IGpuAllocator* gpuAllocator);  // 将plugin附加到执行上下文,并授予plugin对某些上下文资源的访问权限
void detachFromContext();      // 将插件对象从其执行上下文中分离出来
constructor and destructor
Constructor

The constructor can write 1~3, usually the first corresponds to def, the second corresponds to clone, and the third corresponds to serialization.

1. For the network definition stage, the constructor called by PluginCreator when creating the plugin needs to pass weight information and parameters. It can also be used in the clone phase, or write a clone constructor.

MyCustomPlugin(int in_channel, nvinfer1::Weights const& weight, nvinfer1::Weights const& bias);

​2. clone: ​​As the name suggests, it is clone, and clone this plugin object to the builder, network or engine of TensorRT. This member function calls the following constructor:

MyCustomPlugin(float in_channel, const std::vector<float>& weight, const std::vector<float>& bias);

Pass the weights and parameters of the plugin to be cloned to this constructor.

IPluginV2DynamicExt* MyCustomPlugin::clone() const

{
	auto plugin = new MyCustomPlugin{_in_channel, _weight, _bias};
    plugin->setPluginNamespace(mPluginNamespace);
    return plugin;
}

The clone member function is mainly used to pass constant weights and parameters, and copy the plugin n copies so that it can be used by different engines, builders, and networks.

3. It is used in the deserialize stage to pass the serialized weights and parameters into the plugin and create it.

MyCustomPlugin(void const* serialData, size_t serialLength);

Note that the default constructor needs to be deleted;

MyCustomPlugin() = delete;
destructor

The destructor needs to execute terminate, and the terminate function is to release some memory space opened before this op;

MyCustomPlugin::~MyCustomPlugin(){

	terminate();

}
Output related functions

1. Obtain the output number of the layer

int getNbOutputs() const;

2. According to the number of inputs and input dimensions, obtain the dimension of the index output

nvinfer1::Dims getOutputDimensions(int index, const nvinfer1::Dims* inputs, int nbInputDims);

3. According to the input number and input type, get the index output type

nvinfer1::DataType getOutputDataType(int index, const nvinfer1::DataType* inputTypes, int nbInputs) const;
Serialization and deserialization related functions

1. How many bytes need to be written to the buffer when returning serialization

size_t MyCustomPlugin::getSerializationSize() const
{
    return (serialized_size(_in_channel) + serialized_size(_weight) + serialized_size(_bias));
};

2. The serialization function writes the parameter weight of the plugin into the buffer

void MyCustomPlugin::serialize(void* buffer) const
{
	serialize_value(&buffer, _in_channel);
    serialize_value(&buffer, _weight);
    serialize_value(&buffer, _bias);
};

3. If this op uses some other things, such as cublas handle, you can directly use the cublas handle provided by TensorRT:

void MyCustomPlugin::attachToContext(cudnnContext* cudnnContext, cublasContext* cublasContext, IGpuAllocator* gpuAllocator)

{
	mCublas = cublasContext;
}

4. Obtain the type and version of the plugin for deserialization

const char* getPluginType() const;

const char* getPluginVersion() const;
Initialization, configuration, destruction functions

Initialization function, executed before this plugin is ready to run. Generally apply for weight memory space and copy weight

int initialize();

The terminate function is to release some video memory space opened by initialize

void terminate();

Release the resources occupied by the entire plugin

void destroy();

​ Configure the plug-in op of configurePlugin to determine whether the number of input and output types is correct. The official also mentioned that through this configuration information, TensorRT can be informed to select the appropriate algorithm (algorithm) to tune the model. However, automatic tuning has not been tried yet. Generally, the plugin execution code written by oneself is fixed. The so-called tuning steps may be more aimed at the official op.

void MyCustomPluginDynamic::configurePlugin(const nvinfer1::DynamicPluginTensorDesc* inputs, int nbInputs, const nvinfer1::DynamicPluginTensorDesc* outputs, int nbOutputs)
{
    assert(nbOutputs == 1);
    assert(nbInputs == 2);
    assert(mType == inputs[0].desc.type);
};

​ TensorRT calls this method to determine whether the input/output of the pos index supports the format/data type specified by inOut[pos].format and inOut[pos].type. If the plugin supports the format/datatype at inOut[pos], the plugin can make its result depend on the format/datatype in inOut[0...pos-1], which will be set to a value supported by the plugin. This function does not need to check inOut[pos + 1...nbInputs + nbOutputs - 1], the decision of pos must be based only on inOut[0...pos].

bool MyCustomPlugin::supportsFormatCombination(int pos, const nvinfer1::PluginTensorDesc* inOut, int nbInputs, int nbOutputs) 
{
    // 假设有一个输入和一个输出
    assert(0 <= pos && pos < 2);
    const auto *in = inOut;
    const auto *out = inOut + nbInputs;
    switch(pos){
        case 0:
            return in[0].type == DataType::kFLOAT && in[0].format == nvinfer1::TensorFormat::kLINEAR;
        case 1:
            return out[0].type == in[0].type && out[0].format == nvinfer1::TensorFormat::kLINEAR;
    }
};
Run related functions

1. Obtain the video memory size required by the plugin. It is best not to use cudaMalloc to apply for video memory in plugin enqueue .

size_t getWorkspaceSize(const nvinfer1::PluginTensorDesc* inputs, int nbInputs, const nvinfer1::PluginTensorDesc* outputs, int nbOutputs) const{
    // 计算这个op前向过程中需要的中间显存数量
    size_t need_num;
    return need_num * sizeof(float);
};

2. Inference function

int enqueue(int batchSize, const void* const* inputs, void** outputs, void *workspace, cudaStream_t stream){
    // 假设这个fun是需要的中间变量,可以直接使用TensorRT开辟的显存空间
    fun = static_cast<float*>(workspace);
};

​ It should be noted that if some intermediate variables distributed in the video memory are needed during the operation, they can be obtained through the passed pointer parameter workspace. The .cu written by default is fp32. TensorRT will automatically switch to fp32 mode when it runs to a plug-in op that does not support fp16 in the fp16 running mode, and then switch back after the plug-in op is running.

​ You can set max workspace to avoid video memory removal, and can reuse video memory. If cudaMalloc is used to apply for video memory in the previous layer, the next layer cannot be used (the specific reason needs to be clarified) . In addition, the weight value is generally not reused, so the weight value will not be placed in the workspace, and will be applied for using cudaMalloc.

The N dimension in the static shape is variable and less than the max batchsize.
Let's take one of the official samples, the enqueue function in lReluPlugin.cpp as an example:

The formula of leakyRelu is as follows:
insert image description here

int LReLU::enqueue(int batchSize, const void* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) noexcept
{
    
    
    const void* inputData = inputs[0];
    void* outputData = outputs[0];
    pluginStatus_t status = lReLUInference(stream, mBatchDim * batchSize, mNegSlope, inputData, outputData);
    return status; 
}

The corresponding CUDA kernel function is in lReLU.cu, and I added comments in it for your understanding:

template <unsigned nthdsPerCTA>

__launch_bounds__(nthdsPerCTA) __global__ void pReLUKernel(const int n, const float negativeSlope, const float* input, float* output)
{
    
    
    // blockIdx.x表示当前线程块在线程格里x维度上的索引;nthdsPerCTA即blockDim.x,表示当前线程块中x维度上所有线程的个数;
    // threadIdx.x表示当前线程在线程块里x维度上的索引;gridDim.x表示当前线程格中x维度上所有线程块的个数;
    // i += gridDim.x * nthdsPerCTA,代表步长为gridDim.x * nthdsPerCTA,即1个线程格里的所有线程数。
    for(int i = blockIdx.x * nthdsPerCTA + threadIdx.x; i < n; i += gridDim.x * nthdsPerCTA)
    {
    
    
        //negativeSlope就是系数阿尔法
        output[i] = input[i] > 0 ? input[i] : input[i] * negativeSlope;
    }
}

pluginStatus_t lReLUGPU(cudaStream_t stream, const int n, const float negativeSlope, const void* input, void* output)
{
    
    
    // 这个n就是控制leakyRelu输出个数的变量
    const int BS = 512;
    const int GS = (n + BS - 1) / BS;
    // <BS>是模板参数,表示使用的线程块大小,可以传给内核函数pReLUKernel()
    pReLUKernel<BS><<<GS, BS, 0, stream>>>(n, negativeSlope, (const float*) input, (float*) output);
    return STATUS_SUCCESS;
}

pluginStatus_t lReLUInference(cudaStream_t stream, const int n, const float negativeSlope, const void* input, void* output)
{
    
    
    return lReLUGPU(stream, n, negativeSlope, (const float*) input, (float *) output);
}
IPluginCreator related functions

Overview:

class MyCustomPluginCreator : public BaseCreator

{
public:
	MyCustomPluginCreator();
	~MyCustomPluginCreator() override = default;
	const char* getPluginName() const override;
	const char* getPluginVersion() const override;
	const PluginFieldCollection* getFieldNames() override;
	IPluginV2DynamicExt* createPlugin(const char* name, const nvinfer1::PluginFieldCollection* fc) override;
	IPluginV2DynamicExt* deserializePlugin(const char* name, const void* serialData, size_t serialLength) override;

private:
	static PluginFieldCollection mFC;
	static std::vector<PluginField> mPluginAttributes;
	std::string mNamespace;
}

Get pluginname and version to identify creator

const char* getPluginName() const;

const char* getPluginVersion() const;

Create plugin through PluginFieldCollection

Take out the weights and parameters required by the op one by one, and then call the first constructor mentioned above:

const nvinfer1::PluginFieldCollection* getFieldNames();

IPluginV2DynamicExt* MyCustomPlugin::createPlugin(const char* name, const nvinfer1::PluginFieldCollection* fc)
{
    int in_channel;
    std::vector<float> weight;
    std::vector<float> bias;
    const PluginField* fields = fc ->fields;
    for (int i = 0; i < fc ->nbFields; ++i)
    {
        const char* attrName = fields[i].name;
        if (!strcmp(attrName, "in_channel"))
        {
            ASSERT(fields[i].type == PluginFieldType::kINT32);
            in_channel = *(static_cast<const int32_t*>(fields[i].data));
        }
        else if (!strcmp(attrName, "weight"))
        {
            ASSERT(fields[i].type == PluginFieldType::kFLOAT32);
            int size = fields[i].length;
            h_weight.reserve(size);
            const auto* w = static_cast<const float*>(fields[i].data);
            for (int j = 0; j < size; j++)
            {
                h_weight.push_back(*w);
                w++;
            }
        }
        else if(!strcmp(attrName, "bias"))
        {
            ASSERT(fields[i].type == PluginFieldType::kFLOAT32);
            int size = fields[i].length;
            h_bias.reserve(size);
            const auto* w = static_cast<const float*>(fields[i].data);
            for (int j = 0; j < size; j++)
            {
                h_bias.push_back(*w);
                w++;
            }
        }
    }
    
    Weights weightWeights{DataType::kFLOAT, weights.data(), (int64_t) weight.size()};
    Weights biasWeights{DataType::kFLOAT, bias.data(), (int64_t) _bias.size()};
    
    MyCustomPlugin* obj = new MyCustomPlugin(in_channel, weightWeights, biasWeights);
    obj -> setPluginNamespace(mNamespace.c_str());
    return obj;
}

​ PluginFieldCollection is a member variable and will also be used as the return type of the getFieldNames member function. The main function of PluginFieldCollection is to pass the weights and parameters required by the plug-in op, which are not used in the actual engine reasoning process, but will be used in parse (such as caffe2trt, onnx2trt).

​ When using these parses to parse this op, the weights and parameters of this op will go through the process of Models–>TensorRT engine–>TensorRT runtime.

​ For example, in onnx-tensorrt, use DEFINE_BUILTIN_OP_IMPORTER to register the op, then parse the onnx model through parse, and analyze and build the model one by one according to the registered op. Assuming that the defined op is my_custom_op, it will look like this in DEFINE_BUILTIN_OP_IMPORTER(my_custom_op) accomplish:

DEFINE_BUILTIN_OP_IMPORTER(my_custom_op)
{
	ASSERT(inputs.at(0).is_tensor(), ErrorCode::kUNSUPPORTED_NODE);
    ...
    const std::string pluginName = "CUSTOM-OP";
    const std::string pluginVersion = "001";
    
    // f保存这个op需要的权重和参数,从onnx模型中获取
    std::vector<nvinfer1::PluginField>f;
    f.emplace_back("in_channel", &in_channel, nvinfer1::PluginFieldType::kINT32, 1);
    f.emplace_back("weight", kernel_weights.values, nvinfer1::PluginFieldType::kFLOAT32, kernel_weights.count());
    f.emplace_back("bias", bias_weights.values, nvinfer1::PluginFieldType::kFLOAT32, bias_weights.count);
    
    // 从plugin工厂中获取该插件,并且将权重和参数传递进去
    nvinfer1::IPluginV2* plugin = importPluginFromRegistry(ctx, pluginName, pluginVersion, node.name(), f);
    RETURN_FIRST_OUTPUT(ctx->network()->addPluginV2(tensors.data(), tensors.size(), *plugin));
}

​ Entering the importPluginFromRegistry function, you can find that the parameters are passed to the plugin through the fc variable through createPlugin:

nvinfer1::IPluginV2* importPluginFromRegistry(IImporterContext* ctx, const std::string& pluginName,  const std::string& pluginVersion, const std::string& nodeName, const std::vector<nvinfer1::PluginField>& pluginFields)
{
    const auto mPluginRegistry = getPluginRegistry();
    const auto pluginCreator = mPluginRegistry->getPluginCreator(pluginName.c_str(), pluginVersion.c_str(),"ONNXTRT_NAMESPACE");
    
    if(!pluginCreator)
    {
        return nullptr;
    }
    // 接受传进来的权重和参数信息,传递给plugin
    nvinfer1::PluginFieldCollection fc;
    fc.nbFields = pluginFields.size();
    fc.fields = pluginFields.data();
    return pluginCreator->createPlugin(nodeName.c_str(), &fc);
}

​ In the above steps, the pluginName and pluginVersion will be provided to initialize MyCustomPluginCreator, and the createPlugin member function is what we need to write.

Create an empty mPluginAttributes to initialize mFC:

MyCustomPluginCreator::MyCustomPluginCreator()
{
	mPluginAttributes.emplace_back(PluginField("in_channel", nullptr, PluginFieldType::kFLOAT32, 1));
    mPluginAttributes.emplace_back(PluginField("weights", nullptr, PluginFieldType::kFLOAT32, 1));
    mPluginAttributes.emplace_back(PluginField("bias", nullptr, PluginFieldType::kFLOAT32, 1));
    
    mFC.nbFields = mPluginAttributes.size();
    mFC.fields = mPluginAttributes.data();
}

Deserialization, call the deserialization constructor, generate plugin

nvinfer1::IPluginV2* deserializePlugin(const char* name, const void* serialData, size_t serialLength);

​ This function will be called by a conversion op called TRT_IPluginV2 of onnx-tensorrt. This op will read the data data of the onnx model and deserialize it into the network.

​ The name here is the same as the previous type; the createPlugin here needs to write the creation content by itself, and then call it by itself. The Create function can be used to encapsulate the interface and provide a library to the outside world without exposing the plugin design to the outside world. The essence is to call the constructor by passing the structure parameters.

Dynamic Shape Plugin API

Functions that are different from static shape

static implicit (implicit) batch vs dynamic explicit (explicit) batch

1. According to the number of inputs and the dynamic input dimension, obtain the dynamic dimension of the index-th output

static

nvinfer1::Dims getOutputDimensions(int index, const nvinfer1::Dims* inputs, int nbInputDims);

dynamic

nvinfer1::DimsExprs getOutputDimensions(int outputIndex, const nvinfer1::DimsExprs* inputs, int nbInputs, nvinfer1::IExprBuilder& exprBuilder);

2. enqueue and getWorkspaceSize have more input and output information, dimension types, etc.

static

int enqueue(int batchSize, const void* const* inputs, void** outputs, void *workspace, cudaStream_t stream);

dynamic

int enqueue(const nvinfer1::PluginTensorDesc* inputDesc, const nvinfer1::PluginTensorDesc* outputDesc, const void* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream);

Implicit and explicit are only encountered in plugins.

The implicit batch of static shape means that the value of this batch is passed in by enqueue, and the remaining dimensions are determined, and the batch is dynamic. In the static shape, in the reasoning of TRT, the batch cannot be obtained, and the inputs parameter of getOutputDimensions will only have CHW, which is an array with clear values ​​and dimensions. For the enqueue function, they are all explicit values.

The explicit batch of the dynamic shape is in the getOutputDimensions function, and the inputs parameter is NCHW, which has values ​​of these dimensions. The input dimension value of dynamic shape is uncertain, and the relationship between input and output is determined by exprBuilder, which is equivalent to a four arithmetic operator for shape infer. For the enqueue function, since they are all uncertain values, a description of the input and output is required .

Static means that the shape information can be obtained in advance, while dynamic can only be obtained at runtime.

PluginCreatorRegistration

​ When loading the NvInferRuntimeCommon.h header file, you will get a getPluginRegistry, which contains all registered IPluginCreators, and get the corresponding IPluginCreator through the getPluginCreator function when using it.

There are two ways to register:

1. Call the API to register

extern "C" {
    bool initLibNvInferPlugins(void* logger, const char* libNamespace)
    {
        initializePlugin<nvinfer1::plugin::GridAnchorPluginCreator>(logger, libNamespace);
        initializePlugin<nvinfer1::plugin::NMSPluginCreator>(logger, libNamespace);
        initializePlugin<nvinfer1::plugin::ReorgPluginCreator>(logger, libNamespace);
        ...
        return true;
    }
} 

​ The initializePlugin function executes the addPluginCreator function:

template <typename CreatorType>
void initializePlugin(void* logger, const char* libNamespace)
{
    PluginCreatorRegistry::getInstance().addPluginCreator<CreatorType>(logger, libNamespace);
}

​ The addPluginCreator function executes getPluginRegistry() -> registerCreator to register pluginCreator, thus completing the registration task:

void addPluginCreator(void* logger, const char* libNamespace)
{
	...
        if(mRegistryList.find(pluginType) == mRegistryList.end())
        {
            bool status = getPluginRegistry()->registerCreator(*pluginCreator, libNamespace);
            if (status)
            {
                mRegistry.push(std::move(pluginCreator));
                mRegistryList.insert(pluginType);
                verboseMsg = "Plugin creator registration succeeded - " + pluginType;
            }
            else
            {
                errorMsg = "Could not register plugin creator: " + pluginType;
            }
        }
    	else
        {
            verboseMsg = "Plugin creator already registered - " + pluginType;
        }
   ...
}

2. Register directly through REGISTER_TENSORRT_PLUGIN:

// 在加载'NvinferRuntimeCommon.h'头文件的时候会得到一个'getPluginRegistry'
extern "C" TENSORRTAPI nvinfer1::IPluginRegistry* getPluginRegistry();
namespace nvinfer1
{
template <typename T>
class PluginRegistrar{

public:

	PluginRegistrar() {getPluginRegistry()->registerCreator(instance, "");}

private:

	T instance{};

};

#define REGISTER_TENSORRT_PLUGIN(name) \
static nvinfer1::PluginRegistrar<name> pluginRegistrar##name {}
}

​ That is to say, if we have implemented REGISTER_TENSORRT_PLUGIN(BatchedNMSPluginCreator) in the plugin's .h file; there is no need to create a function similar to the official initLibNvInferPlugins() to register one by one.

When the TRT library is loaded, a global variable will be obtained, and when the program starts, it will be registered to the global variable.

How to use the registered PluginCreator?

class IPluginRegistry{

public:

	virtual bool registerCreator(IPluginCreator& creator, const char* pluginNamespace) noexcept = 0;

	virtual IPluginCreator* const* getPluginCreatorList(int* numCreators) const noexcept = 0;

	virtual IPluginCreator* getPluginCreator(const char* pluginType, const char* pluginVersion, const char* pluginNamespace = "") noexcept = 0;

}

​When TRT This is the reason for registration. Only after registration can the plugin be automatically deserialized online to execute the plugin, otherwise it will not be found.

How to call the plugin written by myself?

​ Use the addPluginV2 function, for example:

IPluginV2Layer* embLayer = network->addPluginV2(inputs, 3, embPlugin);

TensorRT如何debug-debug plugin?

TRT is a closed-source software, and its API is relatively complicated.

Under what circumstances should I debug TensorRT Plugin?

1) Regardless of whether the network is built using API or parser, after the model is converted, the result error is very large;

2) Added a custom plugin to implement operator merging, but the results are not correct;

3) After using the FP16 or INT8 optimization strategy, the accuracy of the algorithm drops a lot.

Regular networks and well-trained networks are suitable for int8 optimization, and the rest may lose a lot of accuracy.

Several debugging methods are recommended:

1. Use parser to convert the network, and use the dump API interface to check whether the network structure is correct;

2. If you use the plugin, you need to write unit tests;

3. General method, print output:

1) Official suggestion: set the output of the suspicious layer to network output (relatively tiring);

2) The author's method: add a debug plugin. Link: https://github.com/LitLeo/TensorRT_Tutorial/tree/master/resource_for_billibilli/debug_plugin

TensorRT Plugin converted to FP16:

​ If there is no Plugin in the network, the conversion from FP32 to FP16 can be realized by directly calling the following code:

config->setFlag(BuilderFlag::kFP16);

builder->platformHasFastFp16();          // 判断平台是否支持FP16

builder->platformHasFastInt8();           // 判断平台是否支持Int8 

​ If there are Plugins in the network, you need to pay attention to the following:

1) Things to pay attention to when writing Plugin:

(1) The Enqueue function needs to add a half version;

(2) Pay attention to the supportsFormatCombination function. Ensure that the input and output types are consistent, and require the input and output types to be consistent with mType.

2) For the fp16 model, is the input set to float type or half type?

​ All works, but the suggestion is to set the input to float.

3) The model should be trained with mixed precision, otherwise overflow problems may occur.

Code sample: https://github.com/NVIDIA/TensorRT/blob/7.2.1/plugin/skipLayerNormPlugin/skipLayerNormPlugin.cpp

Reference link:

https://zhuanlan.zhihu.com/p/567244140

https://www.bilibili.com/video/BV19Y411g7YY/?spm_id_from=333.999.0.0&vd_source=8002c1ea19b925cd4fa92e8ddf798043

https://zhuanlan.zhihu.com/p/297002406

https://github.com/nvidia/TensorRT

Guess you like

Origin blog.csdn.net/sinat_41886501/article/details/129624918