Gradient vector, Jacobian matrix, Hessian matrix

Here, three concepts are discussed: gradient vector, Jacobian matrix, Hessian matrix;

By the independent variable x=(x1,x2,…,xn)T;

Dependent variable:

When ① is one-dimensional f(x),

At this time, the vector formed by its first derivative is the gradient vector g(x);

At this time, the matrix formed by the second derivative is the Hessian matrix;

②When multi-dimensional f(x)=(f1(x), f2(x),…,fm(x))T,

At this time, the matrix formed by its first derivative is the Jacobian matrix;

Image Algorithms: Gradient vs Jacobian vs Hessian

method/step

  1. Gradient vector:

    definition:

    The objective function f is a single variable, which is a function of the independent variable vector x=(x1,x2,...,xn)T,

    The univariate function f calculates the gradient of the vector x, and the result is a vector with the same dimension as the vector x, which is called the gradient vector;

    Image Algorithms: Gradient vs Jacobian vs Hessian
  2. Jacobian matrix:

    definition:

    The objective function f is a function vector, f=(f1(x), f2(x),...fm(x))T;

    Among them, the independent variable x=(x1,x2,…,xn)T;

    The function vector f calculates the gradient of x, and the result is a matrix; the number of rows is the dimension of f; the dimension of the column number x is called the Jacobian matrix;

    Each row consists of the transpose of the gradient vector of the corresponding function;

    [Note]: A special case of the gradient vector Jacobian matrix;

    When the objective function is a scalar function, the Jacobian matrix is ​​a gradient vector;

    Image Algorithms: Gradient vs Jacobian vs Hessian
  3. Hessian matrix:

    In fact, the Hessian is the Jacobian of the gradient vector g(x) with respect to the independent variable x:

    Image Algorithms: Gradient vs Jacobian vs Hessian
  4. Inner product:

    The inner product of vector a and vector b is equal to the length of a times the length of b times the cosine of the included angle;

    Image Algorithms: Gradient vs Jacobian vs Hessian
  5. Application of Hessian Matrix in Newton's Method

    Newton's method is mainly used in two aspects: finding the roots of the equation; optimization;

    1) Solve the equation:

    Not all equations have root-finding formulas, or the root-finding formulas are very complicated, making it difficult to solve;

    Using Newton's method, it can be solved iteratively;

    principle:

    Using Taylor's formula, first-order expansion at x0, that is, f(x)=f(x0)+(x-x0)f'(x0);

    Solve the equation f(x)=0, that is, f(x0)+(x-x0)f'(x0)=0, solve x=x1=x0-f(x0)/f'(x0),

    Because of the first-order expansion of Taylor's formula, f(x)=f(x0)+(x-x0)f'(x0) is approximately equal;

    The value of f(x1) is closer to f(x)=0 than f(x0), so iteratively solves;

    Deduce xn+1=xn–f(xn)/f'(xn) through iteration, this formula must converge when f(x^∗)=0;

    The whole process is as follows:

    Image Algorithms: Gradient vs Jacobian vs Hessian
  6. 2) Optimization:

    In an optimization problem,

    Linear optimization can be solved using the fixed point algorithm,

    But for nonlinear optimization problems, Newton's method provides a solution;

    Assuming that an objective function f is optimized, the problem of finding the minima of the function f can be transformed into the problem of f'=0;

    Treat the optimization problem as an equation solving problem (f'=0);

    To solve for the roots of f'=0, expand the 2nd order Taylor expansion of f(x):

    f(x+Δx)=f(x)+f’(x)Δx+1/2f’’(x)Δx^2;

    If and only if Δx approaches 0 infinitely, f(x+Δx)=f(x) reduces these two terms,

    For the remainder f'(x)Δx+1/2f"(x)Δx^2=0, take the derivative of Δx (Note: f'(x), f''(x) are both constant terms;

    Now the above formula is equivalent:

    f’(x)+f’’(x)Δx=0;

    Solve:

    Δx=−f’(xn)/f’’(xn);

    Get the iterative formula:

    xn+1=xn−f’(xn)f’’(xn),n=0,1,...

    It is generally believed that the Newton method can use the information of the curve itself, which is easier to converge than the gradient descent method (less iterations),

    The following figure is an example of minimizing an objective equation, the red curve is solved iteratively by Newton's method, and the green curve is solved by gradient descent method;

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324608361&siteId=291194637
Recommended