Detailed activation function Mish

Activation function Mish


  People who know YOLO or who are engaged in artificial intelligence know that yolov4 came out some time ago, and it has a qualitative leap in accuracy compared to yolov3, which makes people feel incredible, so driven by curiosity, they learned about it. yolov4; Compared with yolov3, there are several main differences: the network layer is deeper, the activation function replaces leakyRelu with mish, and SPP (spatial pyramid pooling) is used. The following is mainly about the activation function mish:
  In the paper " Mish: A Self Regularized Non-Monotonic Neural Activation Function ", Diganta Misra has this description: "Over the years of theoretical research, many activation functions have been proposed, however, only a few are widely used in mostly all applications which include ReLU (Rectified Linear Unit), TanH (Tan Hyperbolic), Sigmoid, Leaky ReLU and Swish. ”and “For instance, in
Squeeze Excite Net- 18 for CIFAR 100 classification, the network with Mish had an increase in Top-1 test accuracy by 0.494% and 1.671% as compared to the same network with Swish and ReLU respectively". Probably means that it is still widely used The applied activation functions are ReLU, TanH, Sigmoid, Leaky ReLU and Swish; in this paper, a new activation function Mish is proposed, and after verification, it is concluded that the accuracy of mish is 0.494% higher than that of Swish and is higher than ReLU. 1.671% higher.
Mish’s formula: f (x) = x ∗ tanh (softplus (x)) f(x)=x*tanh(softplus(x))f(x)=xt a n h ( s o f t p l u s ( x ) )
Mish formula merge step by step:

Mish的输入数据为x
1. softplus: y1 = ln(1+e^x)
2. tanh:     y2 = (e^y1 - e^(y1 * -1)) / (e^y1 - e^(y1 * -1))
3. Mish:     y  = y2 * x

The following figure shows the comparison between Mish and other commonly used activation functions. It can be concluded that Mish and Swish are similar:
Insert picture description here

Comparison of common activation functions

The following figure mainly illustrates the output comparison after three different activation functions of ReLU, Swish, and Mish. It can be found that Mish is smoother than ReLU and Swish.
Insert picture description here

Comparison of ReLU, Swish and Mish results

Insert picture description here

Mish attributes

Code example

import numpy as np
import matplotlib.pyplot as plt

def mish_fun(x):
    tmp = np.log(1 + np.exp(x))
    tmp = np.tanh(tmp)
    tmp = tmp * x
    return tmp

if __name__ == '__main__':

    input = np.array([-15, -11, -10.5, -10, -4.5, -4, -3.5, -3, -2.5, -2, -1.5, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10.5])
    output = mish_fun(input)
    plt.plot(input, output)
    plt.show()
    print(input)
    print(output)

Insert picture description here

Mish activation function

Guess you like

Origin blog.csdn.net/CFH1021/article/details/106083755