[Optimization] Second-order convergence algorithm

Second-order convergence algorithm

Regardless of the previous steepest descent method or the optimal gradient method, they are all linear (first-order) convergence; and in this chapter we will discuss the second-order convergence optimization algorithm.
This type of algorithm not only speeds up the convergence speed, but also ensures that the optimal solution is found within a finite step.

From the perspective of understanding, if an algorithm can find the optimal solution of the quadratic function within a finite number of steps, we say that the algorithm is second-order convergent.


Related background concepts

Optimal step size algorithm properties

In the blog post "[Optimization] Unconstrained Gradient Algorithm" , we first described the general framework of the gradient algorithm, and then determined the iterative form of the steepest descent method; then in order to make the choice of step length k adaptive, we introduced The idea of ​​the optimal step size (each iteration of a one-dimensional search on the step size k) has been the optimal gradient method.

At the end of the article, we give the conclusion that " the search directions of two adjacent points in the optimal gradient method must be orthogonal ".

This conclusion will be used in subsequent algorithms, and we need to promote it:

  • Using the algorithm of the idea of ​​[optimal step size], the gradient direction of this iteration point must be orthogonal to the previous search direction.

【prove】
Insert picture description here

Quadratic function and its concept

1. Definition
——Combination of constant term, first term and quadratic term
(1) Standard form
Insert picture description here
(2) Matrix vector form
Insert picture description here

In other words, matrix A can always be written in symmetric form, because through the conversion of "matrix vector form-standard form", matrix A can always be transformed into a symmetric matrix form:
Insert picture description here

2. Nature

(1) If the matrix A is positive definite, then the corresponding quadratic function f(x) is said to be a positive definite quadratic function.

ps Here you can compare the definition and classification of quadratic forms.

(2) For the quadratic function, according to the matrix derivation rule, the gradient vector can be obtained; if A is a positive definite matrix, then the positive definite quadratic function f(x) has a unique minimum solution .
Insert picture description here

【prove】
Insert picture description here

Complex conjugate direction

1. Definition of conjugation
(1) Orthogonality of vectors
If the result of the inner product operation of two vectors is 0, the two vectors are said to be orthogonal to each other .
u T v = <u, v> = 0 u^Tv = <u,v> = 0uTv=<u ,v>=0

(2) Mutual conjugate.
Define a positive definite matrix A. If two vectors u and v satisfy that u and Av are orthogonal, then the vectors u and v are conjugate with respect to the matrix A.
u TA v = <u, A v> = 0 u^TAv = <u,Av> = 0uT Av=<u ,A v>=0
particularly, if a given set of vectors U. 1, U2, ..., Un-, and has UITAuJ= 0 (I ≠ J), then into whicha set of vectors U. 1, U2, ..., unis about the mutual conjugateofmatrix A; obviously, a set of vectors about the conjugate of the same matrixmust be linearly independent.

Orthogonality can be regarded as a special form of conjugate . When the matrix A takes the identity matrix I, u T Av = u T v.

For any given positive definite matrix A, we must find the mutual conjugate Two vectors; in the discussion of the second-order convergence algorithm in this chapter, we will use the direction of this type of special vector as the search direction , and no longer use the gradient direction.

2. Algorithm idea
The framework of the overall algorithm is consistent with the [Optimal Gradient Method], except that the conjugate search direction is used to replace the internal gradient direction of the former.

3. Conclusions and Theorems

  • Theorem: The algorithm obtained by using the conjugate direction as the search direction at each step of the optimization is second-order convergent.

As we said at the very beginning, when discussing [Second-Order Convergence], we only need to verify whether the algorithm can find an optimal solution of a positive definite quadratic function within a finite step .
Insert picture description here

  • Conclusion: Through the above proof, it is not difficult to find that [choice of initial point] and [arrangement order of conjugate search direction] have no effect on the algorithm.

4. Example
Insert picture description here

In this example, the two conjugate vectors are both given, we just use the algorithm to calculate according to the given conditions; the follow-up will explain how to solve the two vectors about the conjugate of a certain matrix.


Complex conjugate direction method

The conjugate direction method is a general term for a class of algorithms, used to describe those methods whose all search directions are conjugate to each other . We can get the conjugate direction method by modifying the previously discussed algorithm such as [Optimal Gradient Method].
The key to the conjugate direction method is to find two vectors about the conjugate of a given matrix.

1. Evaluate that the
conjugate gradient method is between the steepest descent method and the Newton method, and only needs to use the first-order derivative information; it overcomes the shortcomings of the "slow convergence speed" of the steepest descent method, and also avoids the "storage and calculation second" of the Newton method. The shortcoming of "order derivative and high cost" is one of the most effective algorithms for solving large (non-)linear optimization problems.

  • No matrix storage required
  • Has a faster convergence rate
  • Second-order convergence

2. General conjugate direction method
Insert picture description here

Conjugate Gradient Method

The conjugate gradient algorithm is a typical conjugate direction algorithm, and each of its search directions is conjugate to each other. These search directions d k are the combination of the negative gradient direction -g k and the search direction d k-1 of the previous iteration . Therefore, the storage capacity is small and the calculation is convenient.
dk = − gk + β k − 1 dk − 1, where β is the combination coefficient d_k = -g_k + β_{k-1}d_{k-1}, where β is the combination coefficientdk=gk+bk1dk1, Which is the β is a group bonding lines Number

1. FR Conjugate Gradient Algorithm (Fletcher-Reeves)
Insert picture description here

ps Earlier we discussed that the conjugate direction method can converge to the optimal solution in finite steps for positive definite quadratic functions, but the above algorithm is not only applicable to quadratic functions;
therefore, the third step [ use the current found point x n again It is necessary to replace the original point x 0 and then start a new iteration .

When these algorithms are implemented with the help of computers, there will be calculation errors . Therefore, even if a positive quadratic function optimization problem is solved, there may be the possibility of "unable to converge in a finite step", so the third step of the above algorithm is completely necessary .

2. Brief understanding The
above algorithm is feasible for any function. We can use the special case of the positive quadratic function to verify the idea of ​​solving the conjugate direction:
Insert picture description here
in the above algorithm, we use the FR formula to calculate the given matrix Two vectors that are conjugate each other are the FR conjugate gradient algorithm.

3. Example
Insert picture description here
4. Evaluation
In each iteration, although only the relevant information of the first derivative is used, the gradient still needs to be calculated, and the amount of calculation is still relatively complicated.


Powell algorithm

The core of the Powell algorithm is hope: two vectors that are conjugate to each other can be found without solving the gradient.

It believes that if the two points x 1 and x 2 are one-dimensional search to find two vectors along the same search direction at different initial points, then in the optimization problem of the quadratic function, the vector x 2 -x 1 is Should be orthogonal to the vector v.

1. Algorithm idea and framework.
Insert picture description here
To sum up: For an n-dimensional unconstrained optimization problem, when the given loop termination condition is not met, it will continue to perform rounds of iterations, and each iteration will perform n+ One one-dimensional optimization search, the n+1th search will be carried out along the direction of x n -x 0 , and this new direction [x n -x 0 ] will be used to replace the initial v 0 search direction.

2. Diagram of the algorithm process

Figure source "In-depth understanding of Powell optimization algorithm"

Insert picture description here

The blog post given in the above link discusses the theory and optimization process of Powell algorithm in detail. Readers who need a deep understanding of Powell algorithm can poke into the original blog post to understand.

3. Examples
Insert picture description here


Quasi-Newton Method

1. Overview of Algorithm Ideas

The successful application of Newton's method to optimization problems lies in its use of the curvature information of the Hessan matrix, but the calculation of the Hessan matrix is ​​a lot of work, and not all objective functions have Hessan matrices, so we think about
whether it can only use the objective function The information of the value and the first derivative constructs the approximate value of the curvature of the objective function , thereby accelerating the convergence speed of [quasi-Newton method].

Insert picture description here
2. FP quasi-Newton method
Insert picture description here

Guess you like

Origin blog.csdn.net/kodoshinichi/article/details/110161455