Dropout of Deep Learning Tuning

What is dropout

Dropout is formally used as an alternative form of regularization. The attenuation of L2 regularization for different weights is different, and it depends on the size of the multiplied activation function.

The function of dropout is similar to regularization. The difference from L2 regularization is that dropout is different in the way it is applied, and it is even more suitable for different input ranges.
Note that the value of keep-prob is 1, which means that all units are kept and dropout is not used in this layer. For layers that may be over-fitted and contain many parameters, we can set keep-prob to a relatively small value In order to apply more powerful dropout, it is a bit like dealing with regularization parameters of regularization. We try to apply more regularization to certain layers. Technically, we can also apply dropout to the input layer. We have the opportunity to delete one Or multiple input features. Although we usually don't do this in reality, the value of keep-prob is 1, which is a very common input value. You can also use a larger value, perhaps 0.9. But it is impossible to eliminate half of the input features. If we follow this rule, keep-prob will be close to 1, even if you apply dropout to the input layer .

To sum up, if you are worried that some layers are more prone to overfitting than others, you can set the keep-prob value of some layers lower than other layers. The disadvantage is that to use cross-validation, you have to search more Hyperparameters. Another solution is to apply dropout on some layers, while some layers do not use dropout. The layer applying dropout contains only one hyperparameter, which is keep-prob.

Skills in the implementation process: The
implementation of dropout has many successful firsts in the field of computer vision. The amount of input in computational vision is very large, too many pixels are input, so that there is not enough data, so dropout is used more frequently in computer vision. Some computer vision researchers like to use it very much, and it has almost become the default choice, but Keep in mind that dropout is a regularization method, it helps prevent overfitting, so unless the algorithm overfits, otherwise I will not use dropout, so it is less used in other fields, mainly in In the field of computer vision, because we usually do not have enough data, there has always been overfitting. This is why some computer vision researchers are so fond of dropout functions. Intuitively I don't think it can generalize other subjects.

One major disadvantage of dropout
is that the cost function is no longer clearly defined, and some nodes will be randomly removed in each iteration. If the performance of gradient descent is checked repeatedly, it is actually difficult to review. The well-defined cost function will drop after each iteration, because the cost function we optimize is actually not clearly defined, or it is difficult to calculate to some extent, so we lose the debugging tools to draw such pictures. I usually turn off the dropout function, set the value of keep-prob to 1, and run the code to ensure that the J function decreases monotonically. Then open the dropout function, hoping that the code does not introduce bugs during the dropout process. I think you can also try other methods. Although we do not have statistics on the performance of these methods, you can use them with the dropout method.

Reference:
dropout intuitive understanding
dropout explanation

Guess you like

Origin blog.csdn.net/qq_38574975/article/details/107573859