Huashu Reading Notes (3)-Numerical Calculation

Summary of all notes: "Deep Learning" Flower Book-Summary of Reading Notes

"Deep Learning" PDF free download: "Deep Learning"

One, overflow and underflow

A very devastating rounding error is underflow (underflow), which occurs when a number close to zero is rounded to zero. Another very destructive form of numerical error is overflow (overflow), when a large number of levels are approximated to ∞ \infty or− ∞ -\infty An overflow occurs at .

An example of which must be numerically stable for overflow and underflow is the softmax function. The softmax function is often used to predict the probability associated with the Multinoulli distribution, defined as softmax (xi) = exp ⁡ (xi) ∑ j = in exp ⁡ (xj) softmax(x_i)=\frac{\exp(x_i)}{\sum_{j=i}^n\exp(x_j)}softmax(xi)=j=inexp(xj)exp(xi)

2. Pathological conditions

The condition number characterizes how fast the function changes relative to small changes in the input. A function whose input is slightly disturbed and changes rapidly can be problematic for scientific calculations, because rounding errors in the input can cause huge changes in the output.

3. Gradient-based optimization method

We can reduce ff by moving in the negative gradient directionf . This is known as the steepest descent method (method of steepest descent), orgradient descent(gradient descent).

The Hessian matrix is ​​equivalent to the Jacobian matrix of the gradient, and the corresponding point is the second-order partial derivative of the corresponding characteristic direction.

For example, Newton's method for solving (it will be explained separately when studying the appendix of "Statistical Learning Methods").

Fourth, constrained optimization

Simply put, I hope to find f (x) f(x)When the maximum or minimum value of f ( x ) , givexxx Add some constraints.

The Karush–Kuhn–Tucker (KKT) method is a very general solution for constrained optimization.

KKT conditions (necessary but not sufficient conditions to determine the best advantage):

  1. The gradient of the generalized Lagrangian is zero;
  2. All about xxThe constraints of x and KKT multipliers are both satisfied;
  3. "Complementary relaxation" shown by inequality constraints: α ⊙ h (x) = 0 \alpha\odot h(x)=0ah(x)=0

5. Example: Linear Least Squares

Look directly at the description of P85 in the textbook.

The next chapter portal: Huashu reading notes (4)-machine learning basics

Guess you like

Origin blog.csdn.net/qq_41485273/article/details/112755968