Basic concepts of optimization theory
1. Definition and classification of optimization theory
1. General form
(1) Formula definition:
minf (x), and x ∈ X min f(x), and x ∈ Xminf(x),And there x∈X
Among them, for the letters in the above formula, the meaning is as follows:
- f(x): cost function (objective function)
- x: decision vector (single value or vector)
- X: Constraint set , which must belong to a certain subspace in n-dimensional real number space (R n )
(2) Simple classification:
When X is a proper subset of R n space, the problem described above is called a constrained optimization problem
The problem in the following formula is the constrained optimization problem
min (x 2) [s. T. X + 1 ≤ 0] min (x^2) [st x+1≤0]m i n ( x2)【s.t.x+1≤0】
When X is the R n space, the problem described above is called an unconstrained optimization problem
The problem in the following formula is the unconstrained optimization problem
min (x 2) [x ∈ R] min (x^2) [x∈R]m i n ( x2)【x∈R】
(3) Explanation
Although in the definition, the problem is uniformly defined as a minimization solution form; in fact, minimization and maximization can be transformed into each other.
Maximize f(x) ↔ Minimize -f(x)
[Example] According to the actual topic given, write the corresponding optimized form
2. Linear programming problem
In the previous section, we simply classified optimization problems into constrained/unconstrained categories based on whether there were constraints.
In practice, there are other classification basis for optimization problems, of which linear programming [Linear Programming] is one of them.
(1) Definition
The objective function is linear + the constraints are also linear → the optimization problem is a linear programming problem
(2) The general form
M inimizef (x) ≡ c T ⋅ x, and x ∈ X Minimize f(x) ≡c^T·x, and x∈XMinimizef(x)≡cT⋅x , and there x∈X
Among them, the following provisions are made for the relevant parameters in the definition:
- Constraint set X = {x∈R n , Ax = b , x≥0}
Where Ax = b, which means that the linear constraint is an equality constraint; it can also be constrained in the form Ax≤b
- Because it is a linear expression, the symbols in the above formula are all in the form of vectors and matrices: c∈R n , b∈R m , A∈R mxn
(3) NLP (NonLinear Programming)
When the objective function is non-linear, or the limiting condition is non-linear, the optimization problem is a non-linear programming problem.
Most of our discussion in this course is still the problem of nonlinear programming, because the conditions in real life are complex and it is difficult to satisfy linearity.
3. Other types of optimization problems
(1) Integer Programming (Interger Programming)
Also known as Discrete Optimization, all decision variables or vectors in problem definition and solution are required to be discrete integer types.
(2) Mixed Integer Programming (Mixed Integer Programming)
Some of the decision variables are integer and some are continuous.
(3) Dynamic Planning/Optimal Control (Dynamic Optimization)
Need to consider the dynamics of the system, and often consider the impact of time.
(4) Stochastic Optimization
Need to consider the uncertain factors in the problem, for example, some of the variables involved are random variables, rather than deterministic.
eg The passenger flow in a certain period of time in the reception system is random and uncertain.
(5) Multi-Objective Optimization
When optimizing, there are multiple objective functions that need to be optimized at the same time.
For example, when building a house, you need to consider the goals of "short construction period", "high quality", and "low cost" at the same time.
For multi-objective optimization problems, it is often impossible to disassemble them into several individual optimization problems to solve ; because there are also constraints and contradictions among various goals.
(6) Game Theory
There are often multiple decision makers involved, and the distribution of information is asymmetric (you don't know what others are thinking), which means that you can only get partial information.
2. The basic concept of optimization
1. Convex set
According to the mathematical definition of convex set, its essential meaning is-
if a space set is convex , then the connection between any two points in this set should also be inside this space set .
Example of convex set:
X = {x| ||x|| 2 ≤5, and x∈R 2 }, which means a circular surface with the origin as the center and the radius of 5 on the two-dimensional plane.
Examples of non-convex sets:
Y = {y| 2≤||y|| 2 ≤6, y∈R 2 }, which means a ring on a two-dimensional plane with the origin as the center and the inner diameter and outer diameter of 2 and 6, respectively .
Among them, ||·|| 2 This symbol represents the 2-norm of the vector.
2. Hyperplane
A hyperplane is a set of points satisfying the following constraints
X = {x ∣ c T ⋅ x = z} X = \{x|c^T·x = z\}X={
x∣cT⋅x=z }
where c and z are both vectors, and c≠0, z is a constant vector.
An example of a two-dimensional hyperplane:
x∈R 2 , the point set satisfies [1,1]·[x 1 ,x 2 ] T = 2, the hyperplane corresponding to the set is a straight line x 1 + x 2 in the two-dimensional space = 2.
An example of a three-dimensional hyperplane:
x∈R 3 , the point set satisfies [1,2,3]·[x 1 ,x 2 ,x 3 ] T = 2, the hyperplane corresponding to the set is a plane x 1 in the three-dimensional space +2x 2 +3x 3 = 2.
If a hyperplane is defined, then the hyperplane will divide its space into two parts, one part is {x|c T ·x ≤z}; the other part is {x|c T ·x >z} .
3. The supporting hyperplane of a boundary point of the convex set
Given a convex set X, and taking a boundary point w in the set, if a hyperplane c T x = z is the supporting hyperplane of the boundary point w, it needs to satisfy:
- c T w = z (the hyperplane passes through the boundary point)
- c T x≥z (or c T x≤z) For all x∈Z (all points in the convex set are distributed on the same side of the supporting hyperplane)
4. Convex function (and concave function)
"Defined on a convex set, and the function value of the linear combination of any two independent variables is not greater than the linear combination of the two independent variable function values", such a function is called a convex function.
(1) Definition
(2) Convex/concave/non-convex non-concave/convex and concave function
【Review of Knowledge in Calculus】
For the subsequent understanding and calculation, the concepts of "differential" and "directional derivative" are now reviewed.
5. Three theorems of convexity for differentiable convex functions
When defining convex functions, we do not restrict the function f(x) to be derivable, but here we only discuss the properties of derivable functions.
(1) Theorem 1
If f(x) is a differentiable function defined on the convex set X , then the necessary and sufficient condition for f(x ) to be convex is
f(x 2 )-f(x 1 )≥▽f(x 1 ) T ·(X 2 -x 1 )
(2) Theorem 2
If f(x) is a second-order differentiable function defined on the convex open set X , then the necessary and sufficient condition for f(x) to be a convex function is the Hessian matrix H(x) corresponding to f(x), for Any x is positive semi-definite .
[Background knowledge supplement]
- Open set: For any point in the set, a neighborhood of that point can be found, and the neighborhood must be inside the set.
It can be roughly considered that the so-called open set is a "set without boundary", eg {x| satisfies ||x|| 2 <5} is an open set, if "<" is changed to "≤", it is no longer an open set.
- Positive semi-definite matrix: If a matrix is a symmetric matrix, and for any non-zero vector x, x T Ax ≥ 0, it is said that matrix A is a positive semi-definite matrix.
ps In the series of blog posts on matrix theory, "[Matrix Theory] Hermite Quadratic Form (2)" , the definition and theorem of the positive definiteness of the matrix are also discussed.
[Proof] After
sorting out the above proof process, you can understand that Theorem 2 is based on Theorem 1 .
(3) Theorem three
If f(x) is a second-order differentiable function defined on a convex open set X , then f(x) is a strictly convex function on set X. The sufficient non-necessary condition for this conclusion is the corresponding Hessian matrix H(x) is positive definite for every x∈X .
Because according to Theorem 3, we already know that from "the second-order differentiable function on a convex open set is convex", we can only conclude that H(x) is positive semi-definite, not positive definite.
But if H(x) is positive definite, it can be concluded that f(x) must be a convex function.
[Example] Investigate the convexity of a given function
3. Optimal conditions
1. Local (global) minimum
- Local minimum point: the function value of any point in a certain neighborhood of the point is not less than the function value at that point
- Global minimum point: the value of the function at any point in the domain is not less than the value of the function at that point
[Sweep away some stereotypes]
①A point may be a minimum point or a maximum point
②The global extreme point is not necessarily unique, such as sinx
③The global extreme point and the local extreme point may also be the same, Such as sinx
2. Extremum solution
(1) Most of the available theories and theorems can help determine the local extremum of the problem.
(2) The extremum of the convex function:
①If the objective function is convex , the local minimum found is also the global minimum ;
prove
② If the objective function is strictly convex, then the global minimum is unique.
(3) At present, randomization algorithms such as "simulated annealing" or "genetic algorithm" will be used to solve the global optimal value.
Use the question of "the ant without a map to find the highest peak" to understand the ideas of the above three algorithms:
[Classic Gradient Descent]: Search in a local range from the current starting point, and that direction can go to the highest direction, just along Make a small step in this direction, and then repeat the above process.
ps In order to be able to find the global optimum as much as possible, you can choose multiple starting points to search
[simulated annealing]: a drunk ant will randomly search upwards or downwards; but as time increases, the ants will become more The more sober you are, the more likely it will be to look for higher
places . [Genetic Algorithm]: Use a group of ants to start searching in different places, and then regularly send floods to drown out the lower ants; and those who survived There will be information exchange (reproduction) between ants, and their offspring will start searching in relatively higher terrain.
3. Saddle point and extreme value discriminant theorem
(1) The definition of saddle point
(2) Extreme value discrimination theorem
[Example] Take a screenshot directly, the image quality is not very good, please forgive me