Some understanding of the dropout

Original link: https://blog.csdn.net/youhuakongzhi/article/details/94737502
Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/youhuakongzhi/article/details/94737502

20. Why dropout during training neurons will be connected to twice the ( average) of the input neurons. To compensate for this fact, we need the input connection weights after training of each neuron multiplied by  1-p ??

There is a small but important technical details. Suppose P = 50% , in this case, during the test, the input signal is twice the number of new training , the training time because the input signal of the dropout (p = 0.5), the test is not performed when the dropout , so training when the input signal is only half the test, usually during training neurons will be connected to twice the ( average) of the input neurons. In order to ensure the tremendous amount of input signal does not change, instability of the network. After training we will input connection weights of each neuron multiplied by 0.5 . This ensures that the signal inputs and training testing is the same. More generally, we need to connect each input to keep weight multiplied by the probability (after training 1-the p- ).

Or: When training, the average number of active neurons as the original p times. In the test, all the neurons can all be activated, which causes the output of training and test network inconsistent. To alleviate this problem, when you need to test the output of each neuron is multiplied by the p- , also corresponds to the different neural networks do average .

But the lack of large-scale network data sets can be used when dropout prevent over-fitting, for small networks or say the lack of data collection network is not recommended.

 

21. The Dropout whether it will slow down the training? Will it slow down prediction ( ie, predicted the new instance) ?

Yes, Dropout does slow down the speed training, in general, about twice. However, it predicted no effect, because it only opens during training.

 

22 . Dropout is how to predict using the trained network outputs? That is how the trained network combined? And bagging the same?

      And make predictions Dropout is closed, do represent the final prediction prediction is made when all the training together the different structure of the neural network. The whole process is a number of different neural network final voting decision given the predicted value.

When the layer is wider, the probability of discard all possible paths from input to output becomes small, so the problem is not so important for a wider network layer .

Here we revisit one kind and Bagging similar but yet different regularization methods: Dropout .

    Called Dropout refers to the front propagation algorithm and using back propagation algorithm to train DNN time model, when a number of iterative data randomly from the full connection DNN removing a portion of the hidden layer neurons in the network.

    For example, we had the DNN model corresponding structure is as follows:

https://images2015.cnblogs.com/blog/1042406/201702/1042406-20170227134701063-630638511.png

    At the time of the training set to train a batch of data, we randomly remove a portion of the hidden layer neurons, and use the network to remove the hidden layer neurons to fit our group of training data. Below, to remove half of the neurons in the hidden layer:

https://images2015.cnblogs.com/blog/1042406/201702/1042406-20170227134816751-852364682.png

    And then use this to remove the hidden layer neuron network to carry out an iterative update all of W, b . This is called Dropout .

 

    Of course, Dropout does not mean that these neurons are lost forever. A group of data before the next iteration, we will DNN model restored to the initial fully connected model, and then use a random method to remove portions of neurons in the hidden layer, and then iteratively updated to W is , B . Of course, the hidden layer to remove part of the random process defects DNN network and the last incomplete DNN network is not the same.

  Summarizes the dropout method: each round of iterative gradient descent, it requires the training data is divided into several batches and batches iteration, each batch of data is iterative, requires the original DNN were randomly removed partially hidden layer neurons, with incomplete DNN model to iteratively update W , b . After each batch of data is an iterative update is completed, to incomplete DNN model is restored to the original DNN model.

    As can be seen from the above description dropout and Bagging regularization idea is very different . dropout model W , b is a shared (sharing a lot of blog say, has not understood that the original is the W, b is only one group, each with a different network training once, updated Wb) . All incomplete DNN iteration, updated to the same group W , b ; and Bagging each time regularization DNN model has its own unique set of W , b parameters are independent of each other. Of course, every time they use the original data set based on the resulting data set to train a batch of the model, this is similar.

    Based dropout regularization ratio based on bagging regularization simple, it is obvious, of course, there is no free lunch, because the dropout will iteration of the original data in batches, so the best original data set is large, otherwise the model might underfitting .

24. dropout can more effectively reduce the occurrence of over-fitting, reaching the regularization results to some extent. On the terms of its causes, it can be divided into two main areas:

    Vote achieve a role . For fully connected neural network, we use the same data to train five different neural networks may get several different results, we can vote to decide the winner by means of a multi-vote mechanism, and therefore relatively improved accuracy and robustness of the network. Similarly, for a single neural network, if we will be in batches, although different networks may produce different degrees of over-fitting, but it will be a loss of public function, the equivalent was optimized at the same time, take average, it is possible to more effectively prevent the occurrence of overfitting.
    Reduce the complexity of co-adaptation between neurons . When the hidden layer neurons are randomly deleted, so that a fully connected network with some thinning, thus effectively reducing the synergistic effect of different features. In other words, some features may be dependent on the interaction of hidden nodes fixed relationship, and by Dropout, then it forces a nerve cell, nerve cells and other randomly selected out of working together to achieve good results. Eliminate weaken the joint between the adaptability of neurons node, and enhance the generalization ability.

Since the right each time sample for an input value of updating the network, the nodes are hidden random certain probability, there is no guarantee per every two hidden nodes occur simultaneously updating such weights there is no longer dependent on fixed relationship implicit interaction nodes, preventing some features will be effective only in the case of other specific features.
--------------------- 

bagging与dropout:

  • In bagging, all classifiers are independent, and in dropout, all models are shared parameters of.
  • In bagging, all classifiers are trained to converge at a particular data set, but there is no clear model training course in the dropout. Network is a step in the training time (a sample input, train a random sub-networks) in
  • (Same point) for the training set, the training data for each sub-network is a subset of the sample obtained by replacing the original data. (Repeat sampling this means that, for bagging, the training set is from the whole sample and get back into the random sampling, and for dropout is equivalent to random sampling weights of the entire network, but there is no weights, two methods are trained each new network or forest sampled from a complete data)

 

The main reference: https://www.cnblogs.com/pinard/p/6472666.html (highly recommended) 

https://blog.csdn.net/m0_37477175/article/details/77145459

https://blog.csdn.net/fu6543210/article/details/84450890 

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/youhuakongzhi/article/details/94737502

20. Why dropout during training neurons will be connected to twice the ( average) of the input neurons. To compensate for this fact, we need the input connection weights after training of each neuron multiplied by  1-p ??

Guess you like

Origin blog.csdn.net/sunhua93/article/details/102765026