Deep learning dropout

I wanted to write an article about dropout. Someone wrote it very well, so I just borrowed it directly.

 

First link: Analysis of the principle of Dropout in deep learning

But there is a place in the article that is a bit confusing, the scaling probability of the test set, reorganize:

Dropout can prevent over-fitting and improve training speed.

The probability of joining dropout is p, that is, the neuron is inactivated with the probability of p. For example, if p is set to 0.4, 40 of the 100 nodes will be inactivated and will not participate in training.

Briefly talk about the logic:

1. Define the probability value p of dropout

2. Generate the 0 1 value corresponding to the neuron node through the Bernoulli algorithm, and 0 means the inactivation operation is required

3. Save the parameter value of the node corresponding to 0 and deactivate it (in fact, it is multiplied by 0)

4. Perform forward propagation and back propagation, update the weights of non-deactivated nodes, and the weights of deactivated neurons are the weight values ​​saved before deactivation

5. Restart the next iteration

 

Explain that after deactivation, the number of neurons is less, but the neuron can’t be deactivated during testing, which leads to the number of neurons during training is (1-p) * m, while during testing it is The total number of neurons is m, so it needs to be scaled. There are two ways: ① After the deactivation operation in the training phase, multiply the weight of the non-deactivated node by 1/(1-p) ② In the test phase, multiply the weight Operate with (1-p).

 

 

Guess you like

Origin blog.csdn.net/katrina1rani/article/details/111866496