The past and present of Dropout

Click on " Xiaobai Learning Vision " above, and choose to add " star " or " top "

重磅干货,第一时间送达

Dropout is a class of randomization techniques used for neural network training or inference, which has aroused widespread interest among researchers and is widely used in neural network regularization, model compression, and other tasks. While Dropout was originally tailored for dense neural network layers, recent advances have made Dropout applicable to convolutional and recurrent neural network layers as well. This paper summarizes the development history, applications and current research hotspots of the Dropout method, and also introduces the important methods proposed by the researchers in detail. 

1c90073cf1186a13e5b65709e575d390.png

Figure 1: Some of the currently proposed dropout methods, and the theoretical progress of dropout methods from 2012 to 2019. 

Standard dropout

The original Dropout method proposed in 2012 provides a simple technique for avoiding the overfitting phenomenon that occurs in feedforward neural networks [1]. In each iteration, each neuron in the network is dropped with probability p. When training is complete, the overall network architecture is used, although the neuron's output is multiplied by the probability p that the neuron is dropped. Doing this counteracts the effect of over-sized neural networks with no neurons being dropped, and can be interpreted as averaging over the networks that may appear at training time. The dropout probability for each layer may vary, the original Dropout paper suggested p=0.2 for the input layer and p=0.5 for the hidden layer. Neurons in the output layer are not discarded. This technique is often referred to simply as Dropout, but for the purposes of this article, we call it Standard Dropout to distinguish it from other Dropout methods. The method is shown in Figure 2.

881bd298d90551a434b91c4b68c0542e.png

Figure 2: Example of standard dropout. The fully connected network on the left and the network on the right drop neurons with probability 0.5. The output layer does not apply dropout.

Mathematically speaking, the behavior of standard dropout used during neural network layer training can be written as:

17ec536e5fe614c0099c2ccc758affab.png

where f( ) is the activation function, x is the input of the layer, W is the weight matrix of the layer, y is the output of the layer, and m is the Dropout mask of the layer. The probability that an element is 1 is p. During the testing phase, the output of this layer can be written as:

326e8545128f92b940eb286e2a8f1bfe.png

Drpout method for training

This section describes important dropout methods. Like standard dropout, they are often used to regularize dense feedforward neural network layers during training. Most of these methods are directly inspired by standard Dropout and seek to improve their speed or how effective regularization is.

One of the earliest proposed variants based on standard Dropout is dropconnect proposed by Wan et al. [3] in 2013. This method is a generalization of Dropout, where the weight or bias of each neuron is set to 0 with a certain probability, instead of setting the output of the neuron to 0. Therefore, during training, the output of a certain network layer can be written as:

fd94c76ffca533606c08a5f0f2e62d53.png

The definitions of the variables are the same as in Equation (1), but here a Dropout mask matrix is ​​used instead of a mask vector. Dropoutconnect is shown in Figure 3.

ad5a837d79087b9dcf6229a6c4df8544.png

Figure 3: Example of Dropconnect. The network on the right sets the weights to 0 with probability 0.5.

Standout [4] is a dropout method that attempts to improve standard dropout by adaptively selecting neurons to be dropped (rather than randomly dropping them). This process is achieved by superimposing a binary belief network on the neural network that controls the neural network architecture. For each weight in the original neural network, Standout adds a corresponding weight parameter to the binary belief network. During training, the output of a layer can be written as:

18b0c451705ef3e9239b940918644010.png

The definition of each variable is the same as formula (1), but W represents the weight of the belief network acting on this layer, and g(·) represents the activation function of the belief network.

Fast Dropout [5] provides a faster way to do Dropout-like regularization by explaining the Dropout method from a Bayesian perspective. The authors of Fast Dropout show that the output of a network layer with dropout can be viewed as being sampled from an underlying distribution (which can be approximated as a Gaussian distribution). You can then sample directly from this distribution, or use its parameters to propagate information about the entire dropout set. This technique can be trained faster than standard Dropout (in which only one element of the set of possible networks is sampled at a time).

Another approach inspired by Bayesian dropout understanding methods is variational dropout proposed by Kingma et al. [6] (not to be confused with the work of Gal and Ghahramani [13]). The authors point out that a variant of Dropout using Gaussian multiplicative noise (proposed by Srivastava et al. [8]) can be interpreted as a variational approach given a specific prior on the weights of a network and a specific variational objective. They then derive an adaptive dropout scheme that automatically determines the effective dropout probability for a complete network or a single layer or neuron. This may be an improvement over existing methods that use a deterministic dropout rate, such as using fixed probabilities or grid search. Concrete Dropout [20] is another method to automatically adjust the dropout probability.

convolutional layer

Naive Dropout for Convolutional Neural Networks (CNN) is defined as: Randomly dropping pixels in a feature map or input image. This does not reduce overfitting significantly, mainly because the discarded pixels are highly correlated with their neighbors [21]. Recently, however, researchers have made many promising advances in training CNNs using dropout as a regularization method.

Max-pooling Dropout [12] is a method that preserves the behavior of the max-pooling layer, while also allowing other feature values ​​to affect the output of the pooling layer with a certain probability. The operator masks a subset of eigenvalues ​​before performing the max pooling operation.

b267ab92f5fe22d1155315de48573c8f.png

Figure 4: Max Pooling Dropout in Convolutional Neural Networks [12].

In the paper "Analysis on the dropout effect inconvolutional neural networks" [23], the authors propose a dropout method that changes the probability of dropout based on an iterative process of training. The probability of dropping a neuron is sampled from a uniform or normal distribution. This method is equivalent to adding noise to the output feature map of each layer. This method improves the robustness of the network to noisy image changes [23]. The authors also propose "max-drop", in which high activation values ​​are selectively dropped. These high activation values ​​are selected on feature maps or channels [23]. The experimental results in the paper [23] show that the performance of the proposed method is comparable to “spatial dropout”.

Cutout is another dropout-based regularization and data augmentation method for training CNNs [24], which applies a random square mask over a region of each input image. Unlike other common methods that apply dropout at the feature map level, this method directly applies dropout to the input image. The main motivation behind Cutout is to remove visual features with high activation values ​​in subsequent layers of the CNN [24]. Surprisingly, however, this method of applying a mask on the input image can achieve the same performance at a lower execution cost.

loop layer

In general, the feed-forward dropout method described above can be applied to feed-forward connections in networks with recurrent layers. Therefore, some studies have looked at applying Dropout methods to recurrent connections. Since the noise caused by dropout at each time step makes it difficult for the network to retain long-term memory, applying standard dropout to recurrent connections works poorly [28]. However, Dropout methods specially designed for recurrent layers have also been successful and are widely used in practice. In general, they apply dropout to recurrent connections in a way that still preserves long-term memory.

RNNDrop [30], proposed in 2015, provides a simple solution that enables better memory retention when applying dropout.

aae57d3f61d848900b69a3df1e85d7bb.png

Figure 5: Comparison of Dropout mask sampling for each time step (left) and each sequence (right) on an unrolled recurrent neural network (RNN). Horizontal connections are cyclic, while vertical connections are feed-forward. Different colors represent different dropout masks applied to the corresponding connections.

In 2016, Gal and Ghahramani proposed a variant of RNN Dropout based on a Bayesian interpretation of Dropout methods. The authors point out that if Dropout is viewed as a variational Monte Carlo approximation of a Bayesian posterior, then a natural way to apply it to recurrent layers is to generate a feed-forward connection that zeros out both the feedforward and recurrent connections for each training sequence Dropout mask, but keep the same mask for each time step in the sequence. This is similar to RNNDrop in that the mask is generated on a per-sequence basis, but the derivation process results in applying Dropout at different locations of the LSTM cell.

Cyclic Dropout [14] is another method that can save memory in an LSTM while also generating a different Dropout mask for each input sample as in standard Dropout. This simply applies Dropout to the part of the RNN that updates the hidden state, not the state itself. Therefore, if an element is removed, then it has no effect on the network's memory, rather than eliminating the hidden state.

Dropout method for model compression

Standard Dropout increases the sparsity of neural network weights [8]. This property means that dropout methods can compress neural network models by reducing the number of parameters required to perform efficiently. Since 2017, researchers have proposed several methods for compressing real models based on dropout.

In 2017, Molchanov et al. [9] proposed to use variational dropout [6] (introduced in Section 3 of this paper) to sparse both fully connected and convolutional layers simultaneously. The results show that this method greatly reduces the parameters of standard convolutional networks while having minimal impact on performance. This sparse representation can then be passed into existing methods to convert sparse networks into compressed models (as in the work in [31]). A similar approach was also proposed by Neklyudov et al. [10], who used an improved variational dropout scheme that improved sparsity, but the resulting network structure was particularly compressible.

Recently, further development of dropout methods for model compression is a very active area of ​​research. Recently proposed methods include Targeted Dropout [32], where neurons are adaptively selected and dropped in a way that adapts the network to neural pruning, drastically reducing the size of the network without losing too much accuracy. Another recently proposed method is Ising-dropout [11], which overlays a graphical “Ising” model on top of a neural network to identify less useful neurons and put them in training and inference discard.

Monte Carlo Dropout

In 2016, Gal and Ghahramani [7] proposed a Bayesian approach to understanding Dropout, which is widely accepted. They interpret Dropout as a Bayesian approximation of a deep Gaussian process.

In addition to the usual point estimation outputs, this method provides a simple way to estimate the confidence of neural network outputs. Monte Carlo Dropout has been widely used in model uncertainty estimation.

Paper link: https://arxiv.org/abs/1904.13310

Download 1: OpenCV-Contrib extension module Chinese version tutorial

Reply in the background of the " Xiaobai Learning Vision " official account: Chinese tutorial on extension module , you can download the first Chinese version of the OpenCV extension module tutorial on the whole network, covering extension module installation, SFM algorithm, stereo vision, target tracking, biological vision, super Resolution processing and more than twenty chapters.

Download 2: Python Visual Combat Project 52 Lectures

Reply in the background of the " Xiaobai Learning Vision " public account: Python visual combat project , you can download including image segmentation, mask detection, lane line detection, vehicle counting, adding eyeliner, license plate recognition, character recognition, emotion detection, text content extraction, Facial recognition and other 31 visual combat projects to help fast school computer vision.

Download 3: OpenCV practical project 20 lectures

Reply in the background of the " Xiaobai Learning Vision " official account: OpenCV practical project 20 lectures , you can download 20 practical projects based on OpenCV to achieve advanced OpenCV learning.

exchange group

Welcome to join the public account reader group to communicate with your peers. Currently, there are WeChat groups such as SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, and algorithm competitions (will be gradually subdivided in the future), Please scan the WeChat account below to join the group, note: "nickname + school/company + research direction", for example: "Zhang San + Shanghai Jiaotong University + Visual SLAM". Please remark according to the format, otherwise it will not be approved. After the addition is successful, you will be invited to enter the relevant WeChat group according to the research direction. Please do not send advertisements in the group, otherwise you will be invited out of the group, thank you for your understanding~

923978d78679a32bc34a951347c18642.png

b341c12b2556b89abf801a2fdbfee955.png

Guess you like

Origin blog.csdn.net/qq_42722197/article/details/123650500