In the previous ResNet detailed article, we will find that every time the feature matrix passes through a convolutional layer, we need to perform a normalization process (Batch Normalization, bn for short) and a ReLU (Rectified Linear Unit) function. So why do we need the processing of these two functions? It got me thinking.

1. Why does ResNet need normalization processing

Normalization helps ResNet alleviate the gradient disappearance problem and improve the stability and convergence speed of network training. Through normalization, ResNet is able to train very deep networks.

(1) Alleviate the vanishing gradient problem: In a deep neural network, as the number of network layers increases, the gradient may gradually decrease and disappear, making it difficult to optimize the network. Normalization helps to alleviate the vanishing gradient problem, allowing the gradient to propagate better. By normalizing the input, that is, subtracting the mean of the input features and dividing by the standard deviation, the distribution of the data can be made more stable and the gradient can be avoided from rapidly decreasing in the network.

(2) Improve the stability of network training: Normalization processing helps to improve the stability of network training. By normalizing the input of each batch, the normalization process makes the scale of the input data consistent and reduces the difference in data distribution between different batches. This helps prevent the network from being overly sensitive to small changes in the input data, improving the robustness and generalization of the network.

(3) Accelerate the convergence speed of the network: Normalization processing can accelerate the convergence speed of the network. By alleviating the vanishing gradient problem and improving the stability of network training, normalization enables the network to learn effective feature representations faster. This helps reduce training time and resource consumption, and improves the efficiency of network training.

2. The role of the ReLU function

The role of the ReLU function includes introducing nonlinear activation, activating neurons, solving the problem of gradient disappearance, and providing sparsity and stability. It is widely used in various neural network models.

Nonlinear activation: The ReLU function is a nonlinear function that can introduce nonlinear transformations, enabling neural networks to learn and represent more complex functional relationships. Compared with linear activation functions (such as identity mapping), the nonlinear characteristics of ReLU can improve the expressive ability of the network and better fit nonlinear problems.
Activate neurons: The ReLU function sets input values less than zero to zero and leaves values greater than or equal to zero unchanged. This activation method enables ReLU to threshold the input signal, activating those neurons with large positive values and suppressing those neurons with large negative values. This helps to sparsely activate neurons, improving the sparsity and representational power of the network.
Solve the problem of gradient disappearance: The ReLU function has linear characteristics in the positive interval (input value greater than zero), and will not cause the problem of gradient disappearance. Compared with traditional activation functions (such as sigmoid and tanh functions), the linear characteristics of ReLU can better propagate the gradient and avoid the rapid decay of the gradient in the deep network, thus helping to improve the training efficiency and convergence speed of the network.
Sparsity and stability: The zero interval of the ReLU function can lead to sparsity of neurons. Since the output of ReLU for negative input values is zero, the neurons in the network can automatically learn more discriminative features. In addition, the calculation of the ReLU function is simple and does not require additional parameters, making the calculation of the network more efficient and stable.

The role of normalization processing and ReLU function

1. Why does ResNet need normalization processing

2. The role of the ReLU function

Guess you like