Deep learning foundation: BN layer is not suitable for all deep learning tasks

Batch Norm can be described as a very important technology in deep learning. It not only makes it easier to train deeper networks, accelerates convergence, but also has a certain regularization effect, which can prevent model overfitting. It is widely used in many CNN-based classification tasks. 
However, in terms of image super-resolution and image generation (generated confrontation network), Batch Norm's performance is not good. The addition of Batch Norm makes the training speed slow, unstable, and even divergent. 

Let's further explain why?

The BN layer calculates the mean and variance based on a mini-batch of data instead of the entire training set, which is equivalent to introducing noise when performing gradient calculations. Therefore, BN is not suitable for noise-sensitive reinforcement learning and generative models.

In terms of image super-resolution, the image output by the network is required to be consistent with the input in terms of color, contrast, and brightness, and only the resolution and some details are changed, while Batch Norm is similar to a contrast stretch for the image , after any image passes Batch Norm, its color distribution will be normalized, that is, it destroys the original contrast information of the image, so the addition of Batch Norm will affect the quality of network output. Although the scale and shift parameters in Batch Norm can offset the effect of normalization, this increases the difficulty and time of training, so it is better not to use it directly. However, there is a type of network structure that can be used, that is, the residual network (Residual Net), but it is only used in the residual block, such as SRResNet, which is a residual network for image super-resolution. Why can this kind of network use Batch Norm? Some people think that it is because the contrast information of the image can be directly transmitted through the skip connection, so there is no need to worry about the destruction of Batch Norm. 

Based on this idea, it is also possible to explain why Batch Norm is so effective in image classification tasks from another perspective. Image classification does not need to retain the contrast information of the image, and the classification can be completed by using the structural information of the image. Therefore, the images are normalized through Batch Norm, which reduces the difficulty of training, and even some inconspicuous structures, after Batch Norm are also highlighted (contrast is pulled).

For photo style transfer, why can Batch Norm be used? The reason is that the color, contrast, and brightness of the stylized image are not related to the original image, but only related to the style image. Only the structural information of the original image is expressed in the final generated image. Therefore, it is not surprising to use Batch Norm or Instance Norm in the network of photo style transfer. Moreover, Instance Norm is a more direct normalization operation on a single image than Batch Norm, even without scale and shift.

To put it more broadly, Batch Norm ignores the absolute difference between image pixels (or features) (because the mean is zeroed and the variance is normalized), and only considers the relative difference, so in tasks that do not require absolute differences (such as classification ), which has the effect of icing on the cake. For image super-resolution tasks that require the use of absolute differences, Batch Norm will only add to the confusion.

Guess you like

Origin blog.csdn.net/weixin_43507744/article/details/127619382