[Activation function] PReLU activation function

 1 Introduction

       The PReLU (Parametric Rectified Linear Unit) activation function is an improvement of the ReLU (Rectified Linear Unit) activation function. It was proposed by He et al. in 2015 and aims to solve some limitations of the ReLU activation function.

# 定义 PReLU 激活函数
prelu_activation = torch.nn.PReLU(num_parameters=1, init=0.25, device=None, dtype=None)
  • num_parameters: The number of learnable parameters. The default value is 1. For each input channel, there will be a learnable parameter.

  • init: The initial value of the learnable parameter. The default value is 0.25. This parameter is used to initialize the learnable alphaparameters.

  • device:Specify on which device the parameter is created. The default is None, which means using the current device.

  • dtype: Specifies the data type of the parameter. The default is None, indicating that the default data type is used.

2, official

$\begin{array}{l}f(x)=\left\{\begin{array}{ll}\alpha x & \text { for } x<0 \\ x & \text { for } x \geq 0\end{array}\right. \\ f(x)=\operatorname{PReLU}(x)=\max (\alpha x, x)\end{array}$

Among them, {x} is the input value, \alphawhich is a learnable parameter, usually a constant less than 1.

        This parameter enables PReLU to adjust its output when the input value is negative, instead of outputting 0 directly like traditional ReLU. At that timex = 0 , PReLU behaves the same as standard ReLU, i.e. outputs the input value directly. But x<0 at that time , PReLU output was \alpha x. This design makes PReLU more flexible and efficient than standard ReLU when dealing with negative inputs.

3. Image

4. Features  

  • Improve the vanishing gradient problem : PReLU helps alleviate the vanishing gradient problem of ReLU in the negative input part by providing a non-zero slope (  \alpha controlled by parameters) for negative input values .

  • Parameterization : The parameters  \alpha are learnable, which allows the network to adaptively adjust the shape of the activation function, improving the flexibility of the model . PReLU is a good choice when the network needs to adaptively adjust the behavior of the activation function.

  • Computing resources permitting : Although PReLU adds some computational burden (because of \alpha the need to learn), this is usually acceptable when computing resources are sufficient.

Compare PReLU and LeakyReLU 

The main difference between the PReLU (Parametric Rectified Linear Unit) activation function and the Leaky ReLU activation function is the way and flexibility of parameterization.

        PReLU mathematical expression:

$\begin{array}{l}f(x)=\left\{\begin{array}{ll}\alpha x & \text { for } x<0 \\ x & \text { for } x \geq 0\end{array}\right. \\ f(x)=\operatorname{PReLU}(x)=\max (\alpha x, x)\end{array}$

         Leaky ReLU mathematical expression:

$\begin{array}{l}f(x)=\left\{\begin{array}{ll}\alpha x & \text { for } x<0 \\ x & \text { for } x \geq 0\end{array}\right. \\ f(x)=\operatorname{LeakyReLU}(x)=\max (\alpha x, x)\end{array}$

        It can be seen from the mathematical expression that the calculation methods of these two activation functions are very similar. The only difference is that the slope of the negative input part of PReLU (ie  \alpha) is a learnable parameter , while the slope of the negative input part of Leaky ReLU (ie  \alpha) is Initial set fixed parameters .

  1. to parameterize :

    • PReLU : In PReLU, the slope of the negative input part (ie  \alpha) is the learnable parameter. This means that during training, \alpha the values ​​of are automatically adjusted based on the data, allowing the network to adaptively adjust the shape of its activation function based on the training data.
    • Leaky ReLU : In Leaky ReLU, the slope of the negative input part is also non-zero, but it is fixed, usually a very small constant (like 0.01). This means that the shape of the activation function remains unchanged throughout training.
  2. Flexibility and adaptability :

    • Because PReLU  \alpha is learnable, it can theoretically provide higher flexibility so that the activation function can be better adapted to specific data sets and tasks.
    • Leaky ReLU is more straightforward in implementation due to its simplicity, but it lacks the adaptability provided by PReLU.
  3. Application in practice :

    • PReLU is usually more popular in deep networks, especially when the task requires high flexibility of the model, such as large-scale image or speech recognition tasks.
    • Due to its simplicity, Leaky ReLU is suitable for scenarios that require fast implementation and less parameter adjustment.

        In summary, PReLU provides greater flexibility to the activation function by introducing learnable parameters, while Leaky ReLU provides a simple but stable non-zero slope option for handling the negative input part of the ReLU activation function. Vanishing gradient problem. Which one to choose mainly depends on the requirements of the specific task and the availability of computing resources.

Compare ReLU and other variants

PReLU is one of several variants of ReLU. Other common variations include:

  • ReLU : In its simplest form, the output is the positive part of the input.
  • Leaky ReLU : Fixed a small non-zero slope (e.g. 0.01) for the negative input part, similar to PReLU, but the slope is not learnable.
  • ELU (Exponential Linear Unit) : Provides an exponential curve for negative input to smoothly approach negative infinity.

For more deep learning content, please visit my homepage. The following are quick links:

[Activation Function] Several activation functions you must know in deep learning: Sigmoid, Tanh, ReLU, LeakyReLU and ELU activation functions (2024 latest compilation) - CSDN Blog

Guess you like

Origin blog.csdn.net/Next_SummerAgain/article/details/135424686