Machine learning 2-regression case

Loss function

In order to measure the quality of a function in func set, we need an evaluation function, that is, Loss function, loss function, referred to as L; Loss function is a function of function
L ( f ) = L ( w , b ) L(f) = L(w,b)

Gradient Descent

The great thing about gradient descent is that as long as L ( f ) L(f) is differentiable, and gradient descent can be used to deal with this f f , find the better performing paramenters

Regularization (L1, L2 regularization to solve overfitting)

In the case where the distribution of real data cannot be determined, we try to change the evaluation standard of the loss function as much as possible

  • The expression of our model is to be as complex as possible, containing as many parameters and as many high nonlinearities as possible.
  • But our loss function has the ability to control the parameters and shape of this curve so that it will not appear overfitting.
  • When the real data satisfies the highly non-linear curve distribution, the coefficients of the higher-order terms trained by the loss function control are relatively large, making the resulting curve more crooked.
  • When the real data satisfies the low-order linear distribution, the coefficient of the higher-order items trained by the loss function control is relatively small or even equal to 0, making the resulting curve close to the linear distribution.
    How to ensure that such parameters can be learned? This is why L1 L2 regularization occurs.
  • L1 regularization was added λ w j \lambda\sum \left| w_j\right| item
  • L2 regularization was added λ ( w j ) 2 \lambda\sum(w_j)^2 this one
Published 35 original articles · won 3 · views 3300

Guess you like

Origin blog.csdn.net/qq_43430261/article/details/105511816