[Model Inference] Teach you how to implement the mish operator in tensorrt

  Get into the habit of writing together! This is the 11th day of my participation in the "Nuggets Daily New Plan · April Update Challenge", click to view the details of the event .

欢迎关注我的公众号 [极智视界],获取我的更多笔记分享

   Hello everyone, my name is Jizhi Vision. This article introduces the method of using tensorrt to implement the mish operator.

  I believe that students who have done target detection must be familiar with yolo. yolov4 was proposed in early 2020, followed by yolov5 and some other variants. There are many tricks in yolov4, including the mish activation function. mish was introduced in detail in this paper "Mish: A Self Regularized Non-Monotonic Activation Function", here I made some introductions to the function itself and how tensorrt implements the mish operator.

1. Mathematical expression of mish function

   The mathematical expression of mish is as follows:

  The graphical representation of the function is as follows, where:

  • The blue curve is: mish
  • The orange curve is: ln(1 + e^(x))

   Let's see what mish looks like in yolov4:

   Mish can also be seen as a combination of tanh and softplus. Let's see, the mathematical expression of tanh is as follows:

   The mathematical expression of softplus is as follows, softplus can be regarded as the smoothing of relu.

   The above compares the mathematical expressions of mish, tanh, and softplus. You can easily find that mish can also be written like this:

2. Mish vs. relu

  Relu can be said to be the most commonly used activation function because it can overcome gradient disappearance and speed up training convergence. Relu is a piecewise function, and the mathematical expression is as follows:

  The function image is expressed as follows:

   Comparing mish and relu here is a feeling of head-to-head. Take some experiments in the paper "Mish: A Self Regularized Non-Monotonic Activation Function" for illustration.

   这是 relu 和 mish 的梯度对比,可以看到 mish 的梯度更加平滑。

精度方面,在 ImageNet-1K 数据集上对 mish、swish、relu、leaky relu 激活函数对网络精度的提升对了对比,数据如下:

   以下是在 MS-COCO 目标检测数据集的对比数据:

  从实测的精度提升数据来看,mish 具有十分明显的优势。

   性能方面,在 pytorch 框架中,对 relu、softplus、mish、mish-cuda (RTX-2070) 在 fp32 和 fp16 精度下进行了性能对比,数据如下,可以看到 relu 在推理效率上要比 mish 快,mish-cuda 在用 cuda 进行优化后性能能提升很多。

3、tensorrt 实现 mish 算子

   先让我们来看一下 tensorrt API 直接支持的激活函数算子:

//!
//! \enum ActivationType
//!
//! \brief Enumerates the types of activation to perform in an activation layer.
//!
enum class ActivationType : int32_t
{
    kRELU = 0,             //!< Rectified linear activation.
    kSIGMOID = 1,          //!< Sigmoid activation.
    kTANH = 2,             //!< TanH activation.
    kLEAKY_RELU = 3,       //!< LeakyRelu activation: x>=0 ? x : alpha * x.
    kELU = 4,              //!< Elu activation: x>=0 ? x : alpha * (exp(x) - 1).
    kSELU = 5,             //!< Selu activation: x>0 ? beta * x : beta * (alpha*exp(x) - alpha)
    kSOFTSIGN = 6,         //!< Softsign activation: x / (1+|x|)
    kSOFTPLUS = 7,         //!< Parametric softplus activation: alpha*log(exp(beta*x)+1)
    kCLIP = 8,             //!< Clip activation: max(alpha, min(beta, x))
    kHARD_SIGMOID = 9,     //!< Hard sigmoid activation: max(0, min(1, alpha*x+beta))
    kSCALED_TANH = 10,     //!< Scaled tanh activation: alpha*tanh(beta*x)
    kTHRESHOLDED_RELU = 11 //!< Thresholded ReLU activation: x>alpha ? x : 0
};
复制代码

   可以看到像 relu、sigmoid、tanh ... 这些你都不用自己去写,直接调 trt 的 api 就好了。我们这里的 mish 不是直接支持的,所以用 trt 来实现的话基本有两种思路:

   (1) 用已有算子组合,mish 的话可以用 tanh 和 softplus 组合起来;

   (2) 用 cuda kernel 实现,用 plugin 注册进 trt 使用;

下面进行介绍。

3.1 已有算子组合实现

   这个其实很好写,看看 mish 的数学表达:

  所以基本思路就是先调一个 softplus,再调一个 tanh,把 softplus 的结果传给 tanh,tanh 的输出就等效于一个 mish 的输出。关键代码如下:

########### softplus ############
# 需要注意,trt 里的 softplus 长这样:alpha*log(exp(beta*x)+1)
activationSP = network->addActivation(*Layers[inputName], nvinfer1::ActivationType::kSOFTPLUS);
# 将 alpha 和 beta 设置为 1
activationSP->setAlpha(1);
activationSP->setBeta(1);

############# tanh ##############
nvinfer1::ITensor *activationSP_Out = activationSP->getOutput(0);
mish = network->addActivation(*activationSP_Out, nvinfer1::ActivationType::kTANH);
复制代码

  以上就完成了使用 tensorrt 已有算子组合来实现 mish 操作。

3.2 cuda + plugin 实现

   Convert mish to the mathematical equivalent and convert it into the following mathematical expression:

   The basic idea is to use cuda to directly implement, and the original combination of tanh and softplus requires two operators to become one operator. I don't talk about how to use cuda kernel to achieve it here, let's talk about how to register .cu into tensorrt through plugin.

  First you need a header, something like this:

/// mish.h
#include<NvInfer.h>
#include<NvInferPlugin.h>

calss MishLayerPlugin : public IPluginExt
{
    void mish_infer(...);
}
复制代码

   Then there is .cu, which is the implementation of the operator gpu_infer, which is almost like this:

/// mish.cu
#include "mish.h"

__global__ void mish(...)
{
    ...;
}

void MishLayerPlugin::mish_infer(...)
{
    mish<<<xx, xx>>>(...);
}
复制代码

  Finally, the .cpp is registered into tensorrt through the plugin, which is almost like this:

/// tensort-mish.cpp

#include "mish.h"

void addmish_layer(...)
{
	nvinfer1::DataType Dtype;
	Dtype = nvinfer1::DataType::kFLOAT;
	nvinfer1::IPluginExt *mish = new MishLayerPlugin(xxx, Dtype);
	nvinfer1::IPluginLayer *mish_layer = m_network->addPluginExt(&Layers[inputName], 1, *mish);
   ...
}
复制代码

  Okay, call it a day~ The above shared the method of tensorrt to implement the mish operator. I hope my sharing can help you a little bit in your study.


 【Public number transmission】

"[Model Reasoning] Teach you tensorrt to implement the mish operator"


logo_show.gif

Guess you like

Origin juejin.im/post/7085636157307879438