[Optimization] Unconstrained gradient algorithm

Unconstrained gradient algorithm

Starting from this section, we transition from one-dimensional optimization problems to multi-dimensional optimization problems; in the process of solving multi-dimensional optimization problems, [Gradient Algorithm] is one of the most common and simplest solution ideas.
The content of this chapter focuses on [Unconstrained Gradient Techniques], which is aimed at unconstrained optimization problems.

Disadvantages: Because [Gradient] is a kind of local information, the algorithm usually can only find the local optimal solution of the function.
Advantages: The algorithm is relatively simple and easy to implement.


Algorithm framework

Like the previous blog post, the gradient algorithm is also implemented based on the framework of [Successive Descent Method].
Insert picture description here

Gradient (descent) direction

1. Definition of gradient vector and gradient direction
Insert picture description here
2. Why choose gradient direction as the optimized search direction?
Insert picture description here

Convergence

" Search along the direction of gradient descent, the optimal value of f(x) is not guaranteed to be obtained in finite iterations "

According to the previous derivation, we know that the search is performed along the gradient direction, and the corresponding function value drop can be obtained for each step unit step. The formula is as follows:
Insert picture description here
but at the best point x * , the partial derivative of the function with respect to each component is 0 ;
Insert picture description here
This also means that when the function is approaching the optimal point, the gradient value will only be infinitely close to 0, so the reduction df of the obtained function value will also be close to 0, so that the precise optimal point cannot be reached.


Steepest descent method

1. Algorithm idea
Insert picture description here
2. Example
Insert picture description here
When an initial point x 0 = [x 1 ,x 2 ] T is given , it can be iterated for many times according to the above formula, until the finally found point is close to the optimal point x * .

3. Step size k

According to the above example, it is enough to solve in turn according to the deduced recursion; but there is still a problem in the whole algorithm, that is, what is the specific value of k?

Insert picture description here
①If the value of k is too small, the step size of each iteration will be small, and it takes many iterations to approach the optimal point;
②If the value of k is too large, the iteration can be very efficient at the beginning The ground drops, but afterwards it will oscillate because the step length is too large.

Based on this, we hope that the choice of step length can be [adaptive]. At the beginning, when it is far from the best point, there is a larger step length to improve efficiency; when it is close to the best point, the step size will gradually shrink. To slowly approach the best point.


Optimal Gradient Method

1. Algorithmic Ideas

This algorithm is to solve the problem of [ how to choose the optimal step size k ]

① We start from a given initial point and search along the gradient direction of the point until we find a local optimal point (after finding this new point, we will not continue to walk in this direction, otherwise the function value will no longer be The best);

②Calculate the current new gradient direction at this new point , and then perform the local optimal search along this new gradient direction according to the previous ideas;

It is equivalent to each iteration, in the one-dimensional space of [the gradient direction of the current point] a search for the best advantage;

this one-dimensional optimal search problem is about the step size k, we need to calculate what k to choose, Only by walking along this gradient direction can a local optimal value be found.

Repeat the above process until the global optimal point x * is approached .

2. Algorithm framework and derivation
Insert picture description here
3. Evaluation
①It can effectively reduce the number of steps taken ( the number of iterations )
②But the amount of calculation and overhead required in each iteration will become larger

4. Example
Insert picture description here

[ The search direction of the two adjacent points of the optimal gradient method must be orthogonal ]

We can use the above example to verify this conclusion:
▽f(x 0 ) = [4,8] T ;▽f( x 1 ) = [16/9,-8/9] T , you can get <▽f(x 0 ),▽f(x 1 )> = 0.

5. Discussion of convergence

①Each step uses the optimal step size, then the algorithm is linearly convergent :
Insert picture description here
②Because each step only uses the gradient direction of the current point, it can only guarantee convergence to the local extreme point ; if the function has multiple extremes Value point, the final convergence result depends on the initial point .

③The number of iterations (convergence speed) is also related to the selection of the initial point

④Because the gradient algorithm cannot accurately get the final best point, we often use the following cycle termination criteria in computer programming :
Insert picture description here

Guess you like

Origin blog.csdn.net/kodoshinichi/article/details/110039237