Study notes (1) What is gradient? Why regularization? What is the relationship between Bias and Variance? In order to have a smaller error, how to choose a model?

1. What is a gradient?

Answer: ▽ L is the gradient (as shown in the figure), and the triangle symbol inverted (▽) is the gradient operator (total differentiation in all directions in space)

2. Why should it be regularized?

Answer: We hope to get a smooth function, this function will be less affected by the noise data. It is less sensitive to abnormal data and can improve the fault tolerance of the model. All add the sum of square terms of the parameters in the Loss Function

3. The relationship between Bias and Variance?

Answer: The more complex the model, the more accurate the simulation of real data, and the expected value is close to the real value, so the bias (Bias) is smaller; but the sample data will be more scattered and the variance (Variance) will increase. On the contrary, the simpler the model, the true function model may not be included, and the bias (Bias) will be larger; but the sample data will be more concentrated and the variance (Variance) will be smaller. In conclusion, the model should choose a moderate one, so that the bias (Bias) is as small as possible and the Variance is also small.

If the bias (Bias) is large, it is called underfitting; if the variance (Variance) is large, it is called overfitting.

Solution:

4. How to choose a model for a smaller error (error)?

Five, how to prevent overfitting?

1. Increase the amount of training data

Headline data volume is small, overfitting

                         

The amount of data increases, and the over-fitting is improved

                 

2. L1, L2, L3, L4...Regularization

Add parameter penalty to the loss function

3. Dropout makes the neural network unable to overly rely on the weights of certain parameters

Randomly drop some weight parameters during dropout training

4. Simplify the neural network

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/deephacking/article/details/104936141