SIGAI Episode machine learning mathematics -2

Teaching machine learning related to higher mathematics, linear algebra, probability theory

Outline:

The basic concept of optimizing the
gradient descent method
Newton's method
coordinate descent method
numerical optimization problems faced
Lagrange method
convex optimization problem
convex set
convex functions
convex optimization
Lagrange dual
KKT condition

The basic concept of optimization:

Optimization problem is to compute a function of the extreme value or a maximum value, typically f (x) is a multi-function, x ∈ R n- , the optimization problem is generally expressed as a required minimum value problem.

referred to as optimization variables x, f (x) is called the objective function.

There may be constraints on x, one or more equality or inequality constraints constraints, may have both equality constraints have inequality constraints, this is more complicated.

And satisfy the constraints set in the feasible region is called x f (x) is defined within a composition D.

Local minimum global minimum extrema points of the derivative or gradient equal to zero, the machine learning function are generally derivable.

Direct seeking a first derivative of zero can not be solved extreme point yet? But in fact is not feasible, not to solve the equation, there will be transcendental equation. as follows:

 

 

 In this case to use an iterative method, a continuously updated value of the iteration to find an unknown amount of the derivative is zero point.

 

 

 Gradient descent:

The first numerical optimization (approximate solutions rather than theoretical exact solutions) algorithm, a gradient descent method is called.

Derivation of the iterative gradient descent equation:

If x is x 0 [delta] neighborhood, then Taylor expansion of higher-order infinitesimal item will be discarded, the approximate equation holds.

F (x) = F (x 0 ) + ▽ to T F (x 0 ) (XX 0 ) + O (XX 0 ), when x in x 0 F when δ neighborhood (x) ≈f (x 0 ) + ▽ T f (the X- 0 ) (xx 0 ), that is, f (x) -f (the X- 0 ) ≈ ▽ T f (the X- 0 ) (xx 0 ), then, how to make f (x) than f (the X- 0 ) small and gradually close to the minimum point of it? Let ▽ to T F (X 0 ) (XX 0 ) is less than zero it. How to do it less than zero it? Let ▽ to T F (X 0 ) (XX 0 ) is less than zero, then ▽ to T F (X 0 ), (XX 0 ) is greater than the angle between the vectors 90 degrees, when taking (XX 0) Is negative gradient direction, f (x) fall fastest, i.e., X K +. 1 = X K -R & lt F ▽ to (X K ), where F ▽ to (X K ) preceded by a step coefficient r, in order to maintain the X- k + 1 and the X- k from the past that is full of the X- k + 1 in the X- k neighborhood can ignore the Taylor expansion in more than one term, that is small enough r.

Realization details:

Setting an initial value, x starts from what point, as much as possible close to a start point of extreme point.

Step size selection, selected from R & lt suitable number, usually empirical value sufficiently close to 0 can be positive, sometimes with dynamic adjustment iteration process.

Iteration termination decision rule, || ▽ to F (X K ) || ≤ε, sufficiently small gradient close to zero when the loop exits; or a set number of iterations.

Newton's method:

Also a numerical optimization algorithm, Newton was a physicist and mathematician, Newton and Leibniz invented calculus together, it is a leap in the history of mathematics.

 

 

 

 

Guess you like

Origin www.cnblogs.com/wisir/p/11843058.html