[Optimization] The basic concepts of optimization theory

Basic concepts of optimization theory

1. Definition and classification of optimization theory

1. General form

(1) Formula definition:
minf (x), and x ∈ X min f(x), and x ∈ Xminf(x),And there xX
Among them, for the letters in the above formula, the meaning is as follows:

  • f(x): cost function (objective function)
  • x: decision vector (single value or vector)
  • X: Constraint set , which must belong to a certain subspace in n-dimensional real number space (R n )

(2) Simple classification:

When X is a proper subset of R n space, the problem described above is called a constrained optimization problem

The problem in the following formula is the constrained optimization problem
min (x 2) [s. T. X + 1 ≤ 0] min (x^2) [st x+1≤0]m i n ( x2)s.t.x+10

When X is the R n space, the problem described above is called an unconstrained optimization problem

The problem in the following formula is the unconstrained optimization problem
min (x 2) [x ∈ R] min (x^2) [x∈R]m i n ( x2)xR

(3) Explanation
Although in the definition, the problem is uniformly defined as a minimization solution form; in fact, minimization and maximization can be transformed into each other.
Maximize f(x) ↔ Minimize -f(x)


[Example] According to the actual topic given, write the corresponding optimized form
Insert picture description here


2. Linear programming problem

In the previous section, we simply classified optimization problems into constrained/unconstrained categories based on whether there were constraints.
In practice, there are other classification basis for optimization problems, of which linear programming [Linear Programming] is one of them.

(1) Definition

The objective function is linear + the constraints are also linear → the optimization problem is a linear programming problem

(2) The general form
M inimizef (x) ≡ c T ⋅ x, and x ∈ X Minimize f(x) ≡c^T·x, and x∈XMinimizef(x)cTx , and there xX

Among them, the following provisions are made for the relevant parameters in the definition:

  • Constraint set X = {x∈R n , Ax = b , x≥0}

Where Ax = b, which means that the linear constraint is an equality constraint; it can also be constrained in the form Ax≤b

  • Because it is a linear expression, the symbols in the above formula are all in the form of vectors and matrices: c∈R n , b∈R m , A∈R mxn

(3) NLP (NonLinear Programming)
When the objective function is non-linear, or the limiting condition is non-linear, the optimization problem is a non-linear programming problem.

Most of our discussion in this course is still the problem of nonlinear programming, because the conditions in real life are complex and it is difficult to satisfy linearity.

3. Other types of optimization problems

(1) Integer Programming (Interger Programming)

Also known as Discrete Optimization, all decision variables or vectors in problem definition and solution are required to be discrete integer types.

(2) Mixed Integer Programming (Mixed Integer Programming)

Some of the decision variables are integer and some are continuous.

(3) Dynamic Planning/Optimal Control (Dynamic Optimization)

Need to consider the dynamics of the system, and often consider the impact of time.

(4) Stochastic Optimization

Need to consider the uncertain factors in the problem, for example, some of the variables involved are random variables, rather than deterministic.
eg The passenger flow in a certain period of time in the reception system is random and uncertain.

(5) Multi-Objective Optimization

When optimizing, there are multiple objective functions that need to be optimized at the same time.
For example, when building a house, you need to consider the goals of "short construction period", "high quality", and "low cost" at the same time.

For multi-objective optimization problems, it is often impossible to disassemble them into several individual optimization problems to solve ; because there are also constraints and contradictions among various goals.

(6) Game Theory

There are often multiple decision makers involved, and the distribution of information is asymmetric (you don't know what others are thinking), which means that you can only get partial information.


2. The basic concept of optimization

1. Convex set

Insert picture description here

According to the mathematical definition of convex set, its essential meaning is-
if a space set is convex , then the connection between any two points in this set should also be inside this space set .

Example of convex set:
X = {x| ||x|| 2 ≤5, and x∈R 2 }, which means a circular surface with the origin as the center and the radius of 5 on the two-dimensional plane.

Examples of non-convex sets:
Y = {y| 2≤||y|| 2 ≤6, y∈R 2 }, which means a ring on a two-dimensional plane with the origin as the center and the inner diameter and outer diameter of 2 and 6, respectively .

Among them, ||·|| 2 This symbol represents the 2-norm of the vector.

2. Hyperplane

A hyperplane is a set of points satisfying the following constraints
X = {x ∣ c T ⋅ x = z} X = \{x|c^T·x = z\}X={ xcTx=z }
where c and z are both vectors, and c≠0, z is a constant vector.

An example of a two-dimensional hyperplane:
x∈R 2 , the point set satisfies [1,1]·[x 1 ,x 2 ] T = 2, the hyperplane corresponding to the set is a straight line x 1 + x 2 in the two-dimensional space = 2.

An example of a three-dimensional hyperplane:
x∈R 3 , the point set satisfies [1,2,3]·[x 1 ,x 2 ,x 3 ] T = 2, the hyperplane corresponding to the set is a plane x 1 in the three-dimensional space +2x 2 +3x 3 = 2.

If a hyperplane is defined, then the hyperplane will divide its space into two parts, one part is {x|c T ·x ≤z}; the other part is {x|c T ·x >z} .

3. The supporting hyperplane of a boundary point of the convex set

Given a convex set X, and taking a boundary point w in the set, if a hyperplane c T x = z is the supporting hyperplane of the boundary point w, it needs to satisfy:

  • c T w = z (the hyperplane passes through the boundary point)
  • c T x≥z (or c T x≤z) For all x∈Z (all points in the convex set are distributed on the same side of the supporting hyperplane)
    Insert picture description here

4. Convex function (and concave function)

"Defined on a convex set, and the function value of the linear combination of any two independent variables is not greater than the linear combination of the two independent variable function values", such a function is called a convex function.

(1) Definition Insert picture description here
(2) Convex/concave/non-convex non-concave/convex and concave function
Insert picture description here


【Review of Knowledge in Calculus】

For the subsequent understanding and calculation, the concepts of "differential" and "directional derivative" are now reviewed.

Insert picture description here


5. Three theorems of convexity for differentiable convex functions

When defining convex functions, we do not restrict the function f(x) to be derivable, but here we only discuss the properties of derivable functions.

(1) Theorem 1

If f(x) is a differentiable function defined on the convex set X , then the necessary and sufficient condition for f(x ) to be convex is
f(x 2 )-f(x 1 )≥▽f(x 1 ) T ·(X 2 -x 1 )

(2) Theorem 2

If f(x) is a second-order differentiable function defined on the convex open set X , then the necessary and sufficient condition for f(x) to be a convex function is the Hessian matrix H(x) corresponding to f(x), for Any x is positive semi-definite .

[Background knowledge supplement]

  • Open set: For any point in the set, a neighborhood of that point can be found, and the neighborhood must be inside the set.
    It can be roughly considered that the so-called open set is a "set without boundary", eg {x| satisfies ||x|| 2 <5} is an open set, if "<" is changed to "≤", it is no longer an open set.
  • Positive semi-definite matrix: If a matrix is ​​a symmetric matrix, and for any non-zero vector x, x T Ax ≥ 0, it is said that matrix A is a positive semi-definite matrix.
    ps In the series of blog posts on matrix theory, "[Matrix Theory] Hermite Quadratic Form (2)" , the definition and theorem of the positive definiteness of the matrix are also discussed.

[Proof] After
Insert picture description here
sorting out the above proof process, you can understand that Theorem 2 is based on Theorem 1 .

(3) Theorem three

If f(x) is a second-order differentiable function defined on a convex open set X , then f(x) is a strictly convex function on set X. The sufficient non-necessary condition for this conclusion is the corresponding Hessian matrix H(x) is positive definite for every x∈X .

Because according to Theorem 3, we already know that from "the second-order differentiable function on a convex open set is convex", we can only conclude that H(x) is positive semi-definite, not positive definite.

But if H(x) is positive definite, it can be concluded that f(x) must be a convex function.


[Example] Investigate the convexity of a given function
Insert picture description here


3. Optimal conditions

1. Local (global) minimum

Insert picture description here

  • Local minimum point: the function value of any point in a certain neighborhood of the point is not less than the function value at that point
  • Global minimum point: the value of the function at any point in the domain is not less than the value of the function at that point

[Sweep away some stereotypes]
①A point may be a minimum point or a maximum point
Insert picture description here
②The global extreme point is not necessarily unique, such as sinx
③The global extreme point and the local extreme point may also be the same, Such as sinx

2. Extremum solution

(1) Most of the available theories and theorems can help determine the local extremum of the problem.
(2) The extremum of the convex function:
①If the objective function is convex , the local minimum found is also the global minimum ;

prove
Insert picture description here

② If the objective function is strictly convex, then the global minimum is unique.

(3) At present, randomization algorithms such as "simulated annealing" or "genetic algorithm" will be used to solve the global optimal value.

Use the question of "the ant without a map to find the highest peak" to understand the ideas of the above three algorithms:
[Classic Gradient Descent]: Search in a local range from the current starting point, and that direction can go to the highest direction, just along Make a small step in this direction, and then repeat the above process.
ps In order to be able to find the global optimum as much as possible, you can choose multiple starting points to search

[simulated annealing]: a drunk ant will randomly search upwards or downwards; but as time increases, the ants will become more The more sober you are, the more likely it will be to look for higher

places . [Genetic Algorithm]: Use a group of ants to start searching in different places, and then regularly send floods to drown out the lower ants; and those who survived There will be information exchange (reproduction) between ants, and their offspring will start searching in relatively higher terrain.

3. Saddle point and extreme value discriminant theorem

(1) The definition of saddle point
Insert picture description here
(2) Extreme value discrimination theorem
Insert picture description here

[Example] Take a screenshot directly, the image quality is not very good, please forgive me
Insert picture description here

Guess you like

Origin blog.csdn.net/kodoshinichi/article/details/109736488