[Optimization] One-dimensional search technology

One-dimensional search technology

In the second blog post "[Optimization] Optimization Related Conditions" , we talked about the sufficient conditions, necessary conditions, and sufficient and necessary conditions of the local (global) extreme points of the function; but in the context of real complex problems , It is difficult for us to use the condition of "the first derivative is 0" to solve.

In this section, we will learn how to use a computer to solve the optimal value, starting with the most basic one-dimensional function.
We will learn later that the optimization of multivariate functions has passed the optimization process that contains unary functions.


Basic ideas of various optimization algorithms

1. The problem is abstract-the blind climbs the mountain

Simply put, imagine that a blind person wants to climb to the top of the mountain (or return to the foot of the mountain) with a cane. He can only use the cane to detect the local information of his current location, and then determine the current optimal walking direction, and go back and forth. Until he reached his destination.
Insert picture description here

  • Blind person and crutches: Because in many cases, even if we know the analytic expression of the objective function, we cannot clearly understand the image's properties and the global nature; therefore, we are "invisible" to the global nature of the image.
  • Use crutches to detect local information: We know the function value in some neighborhood of the current point, know a certain gradient value in the neighborhood, and use this information to make decisions.

2. Iterative Descent
Insert picture description here

  • Starting from an initial point x 0 , calculate the local information of this point
  • Use this information to make decisions and choose an appropriate search direction
  • Find the next point x 1 along the search direction , and calculate local information around the new point
  • Repeat the above process to find a series of points x i (i = 1,2,...n), and ensure that f(x) can be reduced every time (if the goal is to minimize the function)

3. Issues to consider

①How to choose the best direction?
②How far to go in the optimal direction?
③How to ensure that the algorithm will converge to the best point?
④How fast does the algorithm converge to the best point?


Unimodal function

1. Definition of unimodal function

Suppose that the function f(x) has a minimum point at x * in the interval [a, b] :

Then f(x) is unimodal in the interval [a, b] , which means that for any point x in the interval [a, b], when x tends to x * , the value of f(x) The function values ​​are all reduced .

In layman's terms, there can be only one minimum point in the interval [a, b].

ps In this section, our discussion of one-dimensional search technology is all around unimodal functions.

Insert picture description here

2. Important conclusions
Suppose it is known that f(x) is a unimodal function on the interval [a,b], and the minimum value is obtained at x * ∈[a,b]; in
order to obtain a point containing the minimum value x The smaller sub-interval of * (the sub-interval is within [a,b]) requires at least two more calculations of the function value .
Insert picture description here
ps This conclusion is very important in the subsequent application of search technology.


(1) Dichotomy

1. Basic idea

Calculate two new points according to the relevant needs mentioned above to determine the range of the new interval; the
dichotomy is to determine the two endpoints of the new interval through the median value of the interval.

Insert picture description here
2. Example
Insert picture description here

[Meaning of ε]
Assuming that the length of the interval obtained by the nth iteration is L n , then according to the above derivation, it can be known that L n+1 = (L n + ε)/2. According to the derivation of convergence, the final interval length is not Will exceed ε, so ε should be set to an acceptable upper bound of error .

ps The idea of ​​the above iteration process is very clear and can be easily implemented in computer language .


(2) Interval equal division method

Because in the above-mentioned dichotomy idea, the determination of the endpoint value of the new interval requires the process of finding the median point → taking the neighborhood around the median point → comparing the function values ​​of the two neighborhood points, which is too cumbersome; so we thought of directly comparing the interval Divide.

1. Two-point halving method

Since you want to find two points in the interval, you need to divide the original interval into thirds.

Insert picture description here
2. Three-point equalization

Bisection method based on two "computational rate slower" this disadvantage, we have further proposed a method of three aliquots, each time interval will be quartered, thereby obtaining a three quarter point as a candidate point .

Insert picture description here
3. Comparison of calculations

The above-mentioned comparison of [Iteration Rate] is only based on the speed of the interval shortening, but this does not represent the whole speed of the algorithm, otherwise readers will easily have a misunderstanding-is it just that the interval is divided into more equal parts? Then the more sub-intervals will be naturally reduced?
——But at the same time, don’t forget that for more divisions, you need to calculate more divisions and the function values ​​at the positions of the divisions.

Below we qualitatively analyze the overall algorithm efficiency of two-point and three-point halves

For two-point divisions, the interval is reduced at a ratio of 2/3, and each round of reduction needs to calculate two points;
for thirds, the interval is reduced by a ratio of 1/2, except for the first round. In addition to calculating three points, only two points need to be calculated for each round .

In summary, we can see that the efficiency of the [three-point halving method] is higher than that of the [two-point halving method].


[Example] Three-point halving method to search for the best advantage
Insert picture description here


(3) Fibonacci search method

From the perspective of reducing the amount of calculation, in order to determine a new interval, each round must have two candidate points (the conclusion given above), is it possible to achieve that each round only needs to recalculate one point, and the other point can be Use the results that have been calculated before to achieve it?

1. The origin of "Fibonacci"
Insert picture description here
Because binary search is used, the corresponding relationship of interval length should be: L n = (L n-1 +ε)/2; that is, L n-1 = 2L n +ε.
Insert picture description here
So as to get the "Fibonacci" relationship of the convergence of the interval length
Insert picture description here
2. Algorithm process
Insert picture description here
3. Example
Insert picture description here
4. Evaluation

(1) Advantages
①Under the same number of rounds, it can reduce the search interval to a minimum length
②No gradient-related calculations are used
③Convenient for computer implementation
④Each iteration only needs to keep four points
⑤Can be calculated in advance Search error and uncertainty

(2) Disadvantages The
number of iterations n before the calculation and the small amount of interval swing ε need to be determined in advance.
This shortcoming can also be overcome by the mechanism of [directly perform iterative calculation and stop the iteration when the search interval is "sufficiently small"]; of course this is the case Certain efficiency will be lost.


(4) Golden section method (0.618 method)

1. The basic idea
The length of the new interval and the interval of the old interval remain constant (the length of the old and new interval in Fibonacci Search is not a constant)

In comparison, the length of the old and new intervals obtained by the previous dichotomy and the interval equalization method is indeed a constant, but they calculate the number of new candidate points in each iteration; the golden section method we expect can be retained. With the advantage of " calculating as few new candidate points as possible ", keep the length of the old and new intervals unchanged.

Insert picture description here
Based on the above mathematical relationship, the equation should be satisfied in each iteration. After sorting (the mathematical process is omitted), the one-dimensional quadratic equation about the scale coefficient λ can be obtained: λ 2 +λ-1 = 0 , and the positive roots are retained after the solution, then λ = 0.618 can be obtained (this is also the [0.618 method] this Source of name)

2. Calculation efficiency
Insert picture description here
3. Example
Insert picture description here

The efficiency of the golden section method is actually higher than that of the three-point halving method:
it looks like the three-point halving method, each iteration reduces the interval to half of the original, but when the number of points is the same, The golden section method is more efficient (the evaluation index is mainly the degree of narrowing of the interval).

4. Summary of existing search algorithms

①All the one-dimensional search techniques discussed at present can be easily implemented by programming
②But in order to achieve a higher accuracy required, these algorithms need to go through more rounds of iterations

In practical applications, we don't actually need the accuracy of the results obtained by one-dimensional search . For example, in a multivariate optimization problem, we often decompose a multivariate optimization problem into several unary optimization problems, and we only need to obtain a rough approximate solution to each unary optimization problem.

The optimization point of the polynomial class (especially the binary function) function is very easy to solve, so we want to-whether we can first approximate any function we need to optimize and solve with a polynomial function, by solving the optimal polynomial function Solution, and get the approximate optimal solution of the original function .


(5) Quadratic Interpolation

1. Algorithm ideas and prototypes

The following description takes the minimization problem as an example

For the current function f(x) to be optimized, at a given initial point x 0 , according to the descending gradient v of this point, we need to find a point x 1 = x 0 + λ 1 v, where the parameter λ 1 must be guaranteed λ 1 minimizes the value of the function f(x 0 +λv) with respect to the λ parameter.

According to the above description, we can equate this minimization problem to the minimization problem of the independent variable λ for the scalar function g(λ) = f(x 0 +λv).

[Parameter specification]
Because the given direction v has been determined to be the descending gradient direction of the function f(x) at point x 0 , λ≥0 is required.
At the same time, in order to prevent problems such as divergence caused by excessive striding during the search, We have formulated the upper bound m of the parameter λ.

In summary, the original one-dimensional optimization search problem for the function f(x) is transformed into a minimization problem
for the function g(λ) = f(x 0 +λv) when 0≤λ≤m .

2. Algorithm framework of two interpolation method
Insert picture description here

  • Whether it is expanding outward according to the law of 2 λ or expanding inward according to (1/2) λ , the essence of these operations is to find three points to ensure that the fitted quadratic curve has "first decline and then rise ", thus ensuring the existence of the minimum value of the quadratic curve.
  • In both cases, after calculating in addition to a, b, and c, the value of the candidate optimal point λ hat can be obtained according to the formula , and the function value of this point is compared with the function value of point b to determine the final approximate solution. Which of the two.
    Insert picture description here
    3. Examples
    Insert picture description here

Observing the solution process of the above two examples, you can know that the solution efficiency is higher, but the accuracy is lower; but this is not a big disadvantage for future multi-function optimization problems.


(6) Newton's method

None of the algorithms mentioned above involve the concept of [derivative] or [gradient], which is involved in Newton's method.

1. Algorithm ideas

  • The algorithm is based on the premise that the function to be sought must be at least second-order continuously differentiable
  • Newton's method is used to solve the approximate solution of the equation f(x) = 0 in numerical analysis. Through the previous explanation, we know that the candidate optimal point of a function often falls at its stagnation point (the point where the first derivative is 0) So we can use Newton's method to find the zero point of the first derivative .
    Insert picture description here
  • The previous method is to search from one interval to the next interval, while Newton's method is to iteratively search from one point to the next point

2. Related conclusions
① When H(x) is a positive definite matrix (the function f(x) corresponds to a strictly convex function), the convergence speed of the algorithm is very fast, and it has nothing to do with the initial point of the algorithm.

For the discussion of the definition of convex function, the convexity of convex function and related theorems, please refer to the previous blog post "[Optimization] Basic Concepts of Optimization Theory"

② But a disadvantage of Newton's method is that it needs to calculate the second derivative in each iteration, which is too large.

But for a quadratic function f(x) = (1/2)x T Gx+b T x+c , the Hessian matrix corresponding to its second derivative is a constant matrix (independent of x);
obviously, According to the derivative operation of the matrix, we can get:
▽f(x) = Gx+b
2 f(x) = G

Its second derivative is indeed a constant matrix G, then it is substituted into the iterative formula in the figure above. One round of iteration only requires the derivative of the solution function at the point x n .

③Based on the second point, when we use Newton's method, we often approximate the function to the form of a quadratic function:
Insert picture description here
3. Example
Insert picture description here

Guess you like

Origin blog.csdn.net/kodoshinichi/article/details/109890226