[Optimization]——Introduction to optimization (1)

Preface

This series of articles serves as relevant notes for optimal learning. Bibliography: "Optimization: Modeling, Algorithms and Theory" by Teacher Wen Zaiwen

Optimization problem summary

General form of optimization problem

The optimization problem can generally be described as:
min ⁡ f ( x ) , st x ∈ X , \begin{equation} \begin{aligned} \min\quad f(x),\\ \text{st}\quad x\ in\mathcal{X}, \end{aligned} \end{equation}minf(x),s.t.xX,
where x = ( x 1 , x 2 , ⋅ ⋅ ⋅ , xn ) T ∈ R nx = (x_1, x_2, · · · , x_n)^T ∈ \R^nx=(x1,x2,⋅⋅⋅,xn)TRn is the decision variable,f : R n → R f : \R_n → \Rf:RnR is the objective function,X ⊆ R n \mathcal{X} ⊆\R_nXRnis a constraint set or feasible region, and the points contained in the feasible region are called feasible solutions or feasible points. The notation st is the abbreviation of "subject to", which refers specifically to constraints. When X = R n \mathcal{X} =\R_nX=RnWhen , problem (1) is called an unconstrained optimization problem. Among all decision variables that satisfy the constraint conditions, the variable x ∗ x^∗ that minimizes the objective functionx is called the optimal solution of optimization problem (1), that is, for anyx ∈xX hasf ( x ) ⩾ f ( x ∗ ) f (x) ⩾ f (x^∗)f(x)f(x)

PS: Notice that in the set X \mathcal{X}On X , functionffThe minimum (maximum) value of f does not necessarily exist, but its lower (upper) bound"inf ⁡ f ( sup ⁡ f ) \inf f (\sup f )inff(supf ) "always exists. Therefore, when the minimum (maximum) value of the objective function does not exist, we care about its lower (upper) bound, that is, " min ⁡ ( max ⁡ ) \min in question (1) (\max)min ( max ) "change to"inf ⁡ ( sup ⁡ ) \inf(\sup)inf(sup)".

Types of optimization problems

  • When both the objective function and the constraint function are linear functions, linear programming ;
  • When at least one of the objective function and the constraint function is a nonlinear function, the corresponding problem is called nonlinear programming ;
  • If the objective function is a quadratic function and the constraint function is a linear function, it is called quadratic programming ;
  • Problems containing non-smooth functions are called non-smooth optimization ; smooth functions are functions that are continuously differentiable to infinite orders within its domain, such as exponential functions.
  • Problems that cannot be directly derived are called derivative-free optimization ;
  • Problems in which variables can only take on integers are called integer programming ;
  • The problem of minimizing a linear function about a positive semidefinite matrix under linear constraints is called semidefinite programming , and its generalized form is cone programming .
  • According to the properties of the optimal solution:
    • Problems in which the optimal solution has only a small number of non-zero elements are called sparse optimization ;
    • The problem in which the optimal solution is a low-rank matrix is ​​called low-rank matrix optimization.

In addition, there are geometric optimization, quadratic cone programming, tensor optimization, robust optimization, global optimization, combinatorial optimization, network planning, stochastic optimization, dynamic programming, constrained optimization with differential equations, differential manifold constrained optimization, distributed optimization, etc. .

Example: Sparse Optimization

Consider the problem of solving a system of linear equations:

A x = b \begin{equation} \begin{aligned} Ax=b \end{aligned} \end{equation} Ax=b
where vector x ∈ R nx ∈ \R^nxRn b ∈ R m b ∈ \R^m bRm , matrixA ∈ R m × n A ∈ \R^{m×n}ARm × n , and the vectorbbThe dimension of b is much smaller than the vector xxThe dimension of x , that is, m ≪ nm \ll nmn . The system of equations is underdetermined, so there are infinitely many solutions. Ifthe prior information ofsparsityAAA and the solution to the original problemuuu satisfies certain conditions, then we can solveuuu is distinguished from other solutions of the system of equations (2). This type of technology is widely used in compressive sensing, a solution that recovers all information from part of the information.

Features such as sparsity can theoretically guarantee uuu is the only solution to the system of equations (2) with the fewest non-zero elements, that is,uuu is as followsl 0 l_0l0The optimal solution to the norm problem:

min ⁡ x ∈ R n ∥ x ∥ 0 , s.t. A x = b . \begin{equation} \begin{aligned} \begin{aligned}\min_{x\in\mathbb{R}^n}\quad&\|x\|_0,\\\text{s.t.}\quad&Ax=b.\end{aligned} \end{aligned} \end{equation} xRnmins.t.x0,Ax=b.
where ‖ x ‖ 0 ‖x‖_0x0refers to xxThe number of non-zero elements in x . Since‖ x ‖ 0 ‖x‖_0x0is a discontinuous function, and its value can only be an integer. Problem (2) is actually NP (non-deterministic polynomial) difficult, and it is very difficult to solve. By converting to ‖ x ‖ 1 ‖x‖_1x1problem, you can solve it:
min ⁡ x ∈ R n ∥ x ∥ 1 , st A x = b . \begin{equation} \begin{aligned} \begin{aligned}\min_{x\in\mathbb{R} ^n}\quad&\|x\|_1,\\\text{st}\quad&Ax=b.\end{aligned} \end{aligned} \end{equation}xRnmins.t.x1,Ax=b.
However, it cannot be solved when converted into 2-norm.

Reason:
Geometrically, the three optimization problems actually want to find the smallest CCC , such that the "norm ball"{ x ∣ ‖ x ‖ ⩽ C } \{x|‖x‖ ⩽ C\}{ x∣‖xC} ‖ ⋅ ‖ ‖ · ‖ represents any norm) is exactly the same asA x = b Ax = bAx=bIntersect ._ The following figure is a schematic diagram:

PS: A norm sphere refers to a spherical set in a certain vector space, with a certain vector as the center and a certain norm as the radius.

Insert image description here
for ℓ 0 ℓ_00Norm, when C = 2 C = 2C=2 { x ∣ ‖ x ‖ 0 ⩽ C } \ {x|‖x‖0 ⩽ C\}{ x∣‖x‖0C } is a full plane, which is naturally related toA x = b Ax = bAx=b intersects, and whenC = 1 C = 1C=When 1 , it degenerates into two straight lines (coordinate axes). The solution to the problem at this time isA x = b Ax = bAx=b and the intersection of these two straight lines; forℓ 1 ℓ_11norm, according to CCC is different{ x ∣ ‖ x ‖ 1 ⩽ C } \{x|‖x‖1 ⩽ C\}{ x∣‖x‖1C } is a series of squares, the vertices of these squares happen to be on the coordinate axis, and the smallestCCThe square and straight line A x = b Ax = bcorresponding to CAx=The intersection points of b are generally vertices, soℓ 1 ℓ_11The solution to the norm is sparse; for ℓ 2 ℓ_22norm, when CCWhen C takes different values​​{ x ∣ ‖ x ‖ 2 ⩽ C } \{x|‖x‖2 ⩽ C\}{ x∣‖x‖2C } is a series of circles, and the circles have smooth boundaries, which are consistent with the straight lineA x = b Ax = bAx=The tangent point of b can be any point on the circumference, soℓ 2 ℓ_22Norm optimization problems generally cannot guarantee the sparsity of solutions.

l 0 l_0l0A norm is not a norm, and this term is used here for the sake of unity of description.
Norm is a concept in mathematics that measures the size or length of a vector. Commonly used norms include Euclidean norm, Manhattan norm, Chebyshev norm, etc.

Let xxx isnn- dimensional vector( x 1 , x 2 , ⋯ , xn ) ⊤ (x_1, x_2, \cdots, x_n)^\top(x1,x2,,xn) , thenxxNorm of x
∥ x ∥ \left\|x\right\|x is defined as:

  • Euclidean norm (2-norm) : ∥ x ∥ 2 = x 1 2 + x 2 2 + ⋯ + xn 2 \left\|x\right\|_2=\sqrt{x_1^2+x_2^ 2+\cdots+x_n^2}x2=x12+x22++xn2
  • Manhattan norm (1-norm) : ∥ x ∥ 1 = ∑ i = 1 n ∣ xi ∣ \left\|x\right\|_1=\sum\limits_{i=1}^n |x_i|x1=i=1nxi
  • Chebyshev norm : ∥ x ∥ ∞ = max ⁡ 1 ≤ i ≤ n ∣ xi ∣ \left\|x\right\|_{\infty}=\max\limits_{1\leq i \leq n}| x_i|x=1inmaxxi

Properties of norms include:

  • Non-negativity : ∥ x ∥ ≥ 0 \left\|x\right\|\geq 0x0,且 ∥ x ∥ = 0 \left\|x\right\|=0 x=0 if and only ifxxx is a zero vector;
  • Homogeneity or homogeneity : ∥ α x ∥ = ∣ α ∣ ∥ x ∥ \left\|\alpha x\right\|=|\alpha|\left\|x\right\|αx=αx , whereα \alphaα is a scalar quantity;
  • 三角不等式 ∥ x + y ∥ ≤ ∥ x ∥ + ∥ y ∥ \left\|x+y\right\|\leq\left\|x\right\|+\left\|y\right\| x+yx+y
  • 自反性 ∥ x ∥ = max ⁡ ∥ y ∥ ≤ 1 x ⊤ y \left\|x\right\|=\max\limits_{\left\|y\right\|\leq 1}x^\top y x=y1maxx y(for Chebyshev norm), or
    ∥ x ∥ = lim ⁡ p → ∞ ( ∑ i = 1 n ∣ xi ∣ p ) 1 p \left\|x\right\|=\lim\limits_{p \rightarrow\infty}\left(\sum\limits_{i=1}^n |x_i|^p\right)^{\frac{1}{p}}x=plim(i=1nxip)p1(for other norms).

ℓ 1 ℓ_11Optimization problem of norm regular term :
min ⁡ x ∈ R n μ ∥ x ∥ 1 + 1 2 ∥ A x − b ∥ 2 2 , \begin{equation} \begin{aligned} \min\limits_{x\in\ mathbb{R}^n}\quad\mu\|x\|_1+\frac{1}{2}\|Ax-b\|_2^2, \end{aligned} \end{equation}xRnminμx1+21Axb22,

where μ > 0 μ > 0m>0 is the given regularization parameter. Problem (5) is also calledLASSO (least absolute shrinkage and selection operator). This problem can be regarded asthe quadratic penalty functionform of problem (3). Since it is an unconstrained optimization problem, it appears to be simpler than problem (3) in form.

Basic concepts of optimization

Global and local optimal solutions

Before solving the optimization problem, we first introduce the definition of the optimal solution of the minimization problem (1).

Insert image description here
If a point is a local minimum solution but not a strict local minimum solution, it is called a non-strict local minimum solution.Insert image description here

Optimization algorithm convergence

Explicit solution : For an optimization problem, if we can use an algebraic expression to give its optimal solution, then this solution is called an explicit solution , and the corresponding problem is often relatively simple.

However, practical problems often cannot be solved explicitly, so iterative algorithms are often used.

  • The basic idea of ​​the iterative algorithm is: from an initial point x 0 x_0x0Start by iterating according to a given rule and get a sequence xk {x_k}xkIf the iteration terminates within a finite number of steps , then the last point is the solution to the optimization problem. If the iteration point sequence is an infinite set , then it is hoped that the limit point (or convergence point ) of the sequence will be the solution to the optimization problem.
  • For a specific algorithm, depending on its design point, we are not sure that we can obtain a high-precision approximate solution. At this time, in order to avoid unusable computational overhead, we also need some convergence criteria to stop the algorithm in time.
  • In algorithm design, an important criterion is whether the point sequence generated by the algorithm converges to the solution of the optimization problem .
  • Consider the unconstrained case. For an algorithm, given the initial point x 0 x_0x0, record the points produced by the iteration as {xk} \{x_k\}{ xk}if{ xk } \{x_k\}{ xk} In a certain norm∣∣ ⋅ ∣ ∣ ||\cdot ||∣∣In the sense of ∣∣ , it satisfies lim ⁡ k → ∞ ∥ xk − x ∗ ∥ = 0 , \lim_{k\to\infty}\|x^kx^*\|=0,limkxkx=0 , and the point of convergencex ∗ x^∗x is a local (global) minimum solution, then we say that the point sequence converges to the local (global) minimum solution, and the corresponding algorithm is said toconverge to the local (global) minimum solution according to the point sequence.
  • Further, if from any initial point x 0 x_0x0Starting from the starting point, the algorithm all converges to the local (global) minimum solution according to the point sequence. We call the algorithm globally converges to the local (global) minimum solution according to the point sequence. Correspondingly, if the corresponding function value sequence is recorded as { f ( xk ) } \{f (x^k)\}{ f(xk )}, we can also define the concept that the algorithm (global) converges to the local (global) minimum according to the function value.
  • For convex optimization problems, because any local optimal solution is a global optimal solution , the convergence of the algorithm is relative to its global minimum .
  • In addition to the convergence of point sequences and function values, commonly used in practice are the optimality conditions of each iteration point (such as the gradient norm in unconstrained optimization problems, the violation of optimality conditions in constrained optimization problems, etc. ) convergence of
  • For the constrained case , given the initial point x 0 x_0x0, the point sequence { xk } \{x_k\} generated by the algorithm{ xk} is not necessarily feasible (i.e.{ xk } \{x_k\}{ xk} X \mathcal X X may not be suitable for anykkk is established). Considering the constraint violation situation, we need to ensure that{ xk } \{x_k\}{ xk} After converging tox ∗ x^∗x , its violation is acceptable. Apart from this requirement, the definition of convergence of the algorithm is the same as for the unconstrained case.

When designing optimization algorithms, we have some basic guidelines or techniques. For complex optimization problems, the basic idea is to transform them into a series of simple optimization problems (the optimal solutions of which are easy to calculate or have explicit expressions) to solve step by step. Commonly used techniques include:
Insert image description here

Asymptotic convergence speed of the algorithm

QQ in dotsQ - Convergence speed (QQQ means "quotient") as an example (function valueQQQ - Convergence rate can be defined similarly). Let{ xk } \{x_k\}{ xk} is the iterative point sequence generated by the algorithm and converges tox ∗ x^∗x.Insert image description hereInsert image description here

Except QQIn addition to Q - convergence speed, another commonly used concept isRRR - Convergence rate (RRR means "root")

Insert image description here

algorithm complexity

A concept closely related to the convergence speed is the complexity of the optimization algorithm N (ε) N (ε)N ( ε ) , that is, calculate the given accuracyε εThe number of iterations or floating-point operations required for the solution of ε .
Insert image description here

Guess you like

Origin blog.csdn.net/sinat_52032317/article/details/132978529