实用线性代数和凸优化 Convex Optimization

If not specified, the following conditions are assumed.
X ∈ R n ∗ m A ∈ R m ∗ n X \in R^{n*m} \\ A \in R^{m*n} XRnmARmn

Trace

t r ( A ) = ∑ i a i i t r ( A + B ) = t r ( A ) + t r ( B ) t r ( c A ) = c ⋅ t r ( A ) t r ( A ) = t r ( A T ) t r ( A T B ) = t r ( A B T ) = ∑ i , j ( A ∘ B ) i j t r ( b a T ) = a T b tr(A) = \sum_i a_{ii} \\ tr(A+B) = tr(A) + tr(B) \\ tr(cA) = c \cdot tr(A) \\ tr(A) = tr(A^T) \\ tr(A^TB) = tr(AB^T) = \sum_{i,j} (A \circ B)_{ij}\\ tr(ba^T) = a^Tb tr(A)=iaiitr(A+B)=tr(A)+tr(B)tr(cA)=ctr(A)tr(A)=tr(AT)tr(ATB)=tr(ABT)=i,j(AB)ijtr(baT)=aTb

Derivation

f ( x ) = ∣ ∣ x − p ∣ ∣ 2 − > ∇ f ( x ) = x − p ∣ ∣ x − p ∣ ∣ 2 f(x) = ||x-p||_2 -> \nabla f(x) = \frac {x-p} {||x-p||_2} f(x)=xp2>f(x)=xp2xp

Vector

Subspace

x , y ∈ V ; α x + β y ∈ V x, y \in V; \alpha x + \beta y \in V x,yV;αx+βyV

Basis

a set B of vecs of min cardinality s.t. span(B) = S

Norms

∣ ∣ x ∣ ∣ ≥ 0 ∣ ∣ x + y ∣ ∣ ≤ ∣ ∣ x ∣ ∣ + ∣ ∣ y ∣ ∣ ∣ ∣ c x ∣ ∣ = ∣ c ∣ ⋅ ∣ ∣ x ∣ ∣ ||x|| \ge 0 \\ ||x+y|| \le ||x|| + ||y|| \\ ||cx|| = |c| \cdot ||x|| x0x+yx+ycx=cx

Cauchy-Schwarz Inequality

∣ x T y ∣ ≤ ∣ ∣ x ∣ ∣ p ∣ ∣ y ∣ ∣ q ; ∀ 1 p + 1 q = 1 |x^Ty| \le ||x||_p ||y||_q; \forall \frac 1 p + \frac 1 q = 1 xTyxpyq;p1+q1=1

Theorems

Orthogonal Theorem

X = S ⊕ S ⊥ ; ∀ S ⊂ X X = S \oplus S^{\perp}; \forall S \subset X X=SS;SX

Projection Theorem

m i n x ∈ S ∣ ∣ y − x ∣ ∣ ⇒ y ∗ ∈ S , ( y − y ∗ ) ⊥ S min_{x \in S} ||y - x|| \Rightarrow y^* \in S, (y-y^*) \perp S minxSyxyS,(yy)S

Matrix

Partition

A b = ∑ j a j b j c T A = ∑ i c i A i Ab = \sum_j a_jb_j \\ c^TA = \sum_i c_iA_i Ab=jajbjcTA=iciAi

Range

R ( A ) = { A x : x ∈ R n } R m = R ( A ) ⊕ N ( A T ) R n = R ( A T ) ⊕ N ( A ) R(A) = \{Ax: x \in R^n\} \\ R^m = R(A) \oplus N(A^T) \\ R^n = R(A^T) \oplus N(A) \\ R(A)={ Ax:xRn}Rm=R(A)N(AT)Rn=R(AT)N(A)

Fundamental Theorem of Linear Algebra

w = A x + z , z ∈ N ( A T ) w = Ax + z, z \in N(A^T) w=Ax+z,zN(AT)

Kernel

K = X T X K = X^TX K=XTX

Orthogonal

A A T = A T A = I , A T = A − 1 AA^T = A^TA = I, A^T = A^{-1} AAT=ATA=I,AT=A1

Schur Complements

M = [ A X X T B ] S = A − X B − 1 X T M ≽ 0    ⟺    S ≥ 0 M = \begin{bmatrix} A & X \\ X^T & B \\ \end{bmatrix} \\ S = A - XB^{-1}X^T \\ M \succcurlyeq 0 \iff S \ge 0 M=[AXTXB]S=AXB1XTM0S0

Positive Definiteness

For a symmetric square matrix A, PSD means
x T A x ≥ 0 , ∀ x ∈ R m    ⟺    λ i ( A ) ≥ 0 x^T A x \ge 0, \forall x \in R^m \iff \lambda_i(A) \ge 0 xTAx0,xRmλi(A)0
The determinant of PSD is non-negative. The numbers on the diagonal are non-negative.

The definition of PD replaces all ≥ \ge to > > >.

If a PSD matrix is invertible, then it is PD.

Matrix Norms

Frobenius Norm

∣ ∣ A ∣ ∣ F = t r ( A A T ) = ∑ i , j ∣ A i j ∣ 2 = ∑ i λ i ( A A T ) ||A||_F = \sqrt{tr(AA^T)} = \sqrt{\sum_{i,j} |A_{ij}|^2} = \sqrt{\sum_{i} \lambda_i(AA^T)} AF=tr(AAT) =i,jAij2 =iλi(AAT)

Operator Norm

∣ ∣ A ∣ ∣ p = m a x ∣ ∣ u ∣ ∣ p = 1 ∣ ∣ A u ∣ ∣ p ||A||_p = max_{||u||_p=1} ||Au||_p Ap=maxup=1Aup

l1 norm: largest abs col sum

l2 norm: λ m a x ( A A T ) = σ 1 \sqrt{\lambda_{max}(AA^T)} = \sigma_1 λmax(AAT) =σ1

l_inf norm: largest abs row sum

Nuclear Norm

∣ ∣ A ∣ ∣ ∗ = ∑ i σ i ||A||_* = \sum_i \sigma_i A=iσi

Matrix Decomposition

Orthogonal-Triangular Decomposition (QR)

A = Q R A = QR A=QR

For square matrix A, Q is orthogonal, R is upper triangular.

For non-square matrix with m < n, we still have Q T Q = I m Q^TQ = I_m QTQ=Im. It can be useful to partition both Q and R.

Cholesky Decomposition

A is PD, L is lower triangular matrix
A = L L T A = LL^T A=LLT

Singular Value Decomposition (SVD)

For non-zero matrix A,
A = U Σ V T A v i = σ i u i , A T u i = σ i v i , i = 1 ∼ r σ i 2 = λ i ( A A T ) = λ i ( A T A ) , i = 1 ∼ r ∣ ∣ A ∣ ∣ F 2 = t r ( A T A ) = ∑ i = 1 n σ i 2 A k = U ~ Σ ~ V ~ T v a r i a n c e   e x p l a i n e d = η k = ∣ ∣ A k ∣ ∣ F 2 ∣ ∣ A ∣ ∣ F 2 = σ 1 2 + . . . + σ k 2 σ 1 2 + . . . + σ n 2 A = U \Sigma V^T \\ Av_i = \sigma_i u_i, A^T u_i = \sigma_i v_i, i=1 \sim r \\ \sigma_i^2 = \lambda_i(AA^T) = \lambda_i(A^TA), i=1 \sim r \\ ||A||_F^2 = tr(A^TA) = \sum_{i=1}^n {\sigma_i^2} \\\\ A_k = \tilde U \tilde \Sigma \tilde V^T \\ variance \ explained = \eta_k = \frac {||A_k||_F^2}{||A||_F^2} = \frac {\sigma_1^2 + ... + \sigma_k^2}{\sigma_1^2 + ... + \sigma_n^2} A=UΣVTAvi=σiui,ATui=σivi,i=1rσi2=λi(AAT)=λi(ATA),i=1rAF2=tr(ATA)=i=1nσi2Ak=U~Σ~V~Tvariance explained=ηk=AF2AkF2=σ12+...+σn2σ12+...+σk2
u i u_i ui and v i v_i vi are eigen vectors of A T A A^TA ATA and A A T AA^T AAT respectively.

x j x_j xj in low dimension is x ~ j = S ~ V ~ T e j \tilde x_j = \tilde S \tilde V^T e_j x~j=S~V~Tej, to recover use x j ′ = U ~ x ~ j x_j' = \tilde U \tilde x_j xj=U~x~j.

Spectral Decomposition

A is a square symmetric matrix.
A = U Λ U T = ∑ i λ i u i u i T A = U\Lambda U^T = \sum_i \lambda_i u_i u_i^T A=UΛUT=iλiuiuiT
Rayleigh quotient
λ m i n ≤ x T A x x T x ≤ λ m a x , x ≠ 0 \lambda_{min} \le \frac{x^TAx}{x^Tx} \le \lambda_{max}, x \ne 0 λminxTxxTAxλmax,x=0
For any matrix, Matrix gain (spectral norm)
∣ ∣ A ∣ ∣ 2 = m a x ∣ ∣ x ∣ ∣ 2 = 1 ∣ ∣ A x ∣ ∣ 2 ≤ λ m a x ( A T A ) ||A||_2 = max_{||x||_2=1} ||Ax||_2 \le \sqrt{\lambda_{max}(A^TA)} A2=maxx2=1Ax2λmax(ATA)

Sample Covariance Matrix

C = 1 m ∑ i = 1 m ( x i − x ^ ) ( x i − x ^ ) T s i = w T x i σ 2 = ∑ i = 1 m ( w T x i − s ^ ) = w T C w t r ( C ) = 1 m ∣ ∣ X ∣ ∣ F 2 C = \frac 1 m \sum_{i=1}^m (x_i - \hat x)(x_i - \hat x)^T \\ s_i = w^Tx_i \\ \sigma^2 = \sum_{i=1}^m (w^Tx_i - \hat s) = w^TCw \\ tr(C) = \frac 1 m ||X||_F^2 C=m1i=1m(xix^)(xix^)Tsi=wTxiσ2=i=1m(wTxis^)=wTCwtr(C)=m1XF2

Apparently, the covariance matrix is PSD.

Ellipsoid

Let P be a PD matrix, such that P = L L T = U Λ U T P = LL^T = U \Lambda U^T P=LLT=UΛUT. The standard form is
E = { x : x T P − 1 x ≤ 1 } E = \{x: x^T P^{-1} x \le 1\} \\ E={ x:xTP1x1}
Converting another form to the standard form.
E = { x ^ + L z : ∣ ∣ z ∣ ∣ 2 ≤ 1 } = { x : ∣ ∣ L − 1 ( x − x ^ ) ∣ ∣ 2 ≤ 1 } = { x : ( x − x ^ ) T P − 1 ( x − x ^ ) ≤ 1 } E = \{\hat x+Lz: ||z||_2 \le 1\} \\ = \{x: ||L^{-1}(x-\hat x)||_2 \le 1\} \\ = \{x: (x-\hat x)^TP^{-1}(x-\hat x) \le 1\} \\ E={ x^+Lz:z21}={ x:L1(xx^)21}={ x:(xx^)TP1(xx^)1}

Linear Equation

Systems

A x = y Ax = y Ax=y

We know that # equations = m, # unknows = n

Overdetermined System: m > n, one solution or none

Underdetermined System: m < n, dim(set of solution) = n-m

Square System: m = n

We can solve the system using SVD.
A = U Σ V T x ′ = V T x , y ′ = U T y A = U \Sigma V^T \\ x' = V^T x, y' = U^T y \\ A=UΣVTx=VTx,y=UTy
Assume that rank(A) = r < m.
Σ x ′ = y ′ ⇒ y i ′ = { σ i x i ′ , i = 1 ∼ r 0 , i = r + 1 ∼ m \Sigma x' = y' \Rightarrow y_i' = \begin{cases} \sigma_i x_i', & i=1 \sim r \\ 0, & i=r+1 \sim m \\ \end{cases} Σx=yyi={ σixi,0,i=1ri=r+1m
If y ∉ R ( A ) y \notin R(A) y/R(A), the system is not feasible.

If y ∈ R ( A ) y \in R(A) yR(A), the system is feasible, x i ′ = y i ′ / σ i , i = 1 ∼ r x_i'= y_i' / \sigma_i, i = 1 \sim r xi=yi/σi,i=1r.

If A is full column rank, then there is unique solution.

Linear Dynamical System

x t + 1 = A t x t x_{t+1} = A_t x_t xt+1=Atxt

The system is time invariant if A t = A A_t = A At=A. It can be extended to include inputs and offset, or to an auto-regressive model.
x t + 1 = A x t + b x_{t+1} = A x_t + b xt+1=Axt+b
The steady-state solution is when t → ∞ t \rightarrow \infty t, ( I − A ) x t = b (I - A)x_t = b (IA)xt=b.

Least Square

Plain

m i n x ∣ ∣ A x − y ∣ ∣ 2 2 x ∗ = ( A T A ) − 1 A T y min_x ||Ax-y||_2^2 \\ x^* = (A^TA)^{-1}A^Ty minxAxy22x=(ATA)1ATy

There is another variation which contains weights, but it can be converted.
m i n x ∣ ∣ W ( A x − y ) ∣ ∣ 2 2 = m i n x ∣ ∣ A w x − y w ∣ ∣ 2 2 min_x ||W(Ax-y)||_2^2 = min_x ||A_wx-y_w||_2^2 \\ minxW(Axy)22=minxAwxyw22

Constrained

m i n x ∣ ∣ A x − y ∣ ∣ 2 2 : C x = d min_x ||Ax-y||_2^2 : Cx = d minxAxy22:Cx=d

Define x ′ x' x s.t. C x ′ = d Cx'=d Cx=d. The solution set is x = x ′ + B z , B ∈ N ( C ) x = x' + Bz, B \in N(C) x=x+Bz,BN(C). We can convert the problem to
m i n x ∣ ∣ A ′ z − y ′ ∣ ∣ 2 2 : A ′ = A B , y ′ = y − A x ′ min_x ||A'z-y'||_2^2 : A' = AB, y' = y-Ax' minxAzy22:A=AB,y=yAx

Penalties

Take L2 norm as an example.
m i n x ∣ ∣ A x − y ∣ ∣ 2 2 + ϕ ( x ) = m i n x ∣ ∣ A x − y ∣ ∣ 2 2 + λ ∣ ∣ x ∣ ∣ 2 2 min_x ||Ax-y||_2^2 + \phi(x)\\ = min_x ||Ax-y||_2^2 + \lambda||x||_2^2\\ minxAxy22+ϕ(x)=minxAxy22+λx22
We can construct a new A and y.
A ′ = [ A λ I n ] y ′ = [ y 0 n ] A' = \begin{bmatrix} A \\ \sqrt \lambda I_n \\ \end{bmatrix} \\ y' = \begin{bmatrix} y \\ 0_n \\ \end{bmatrix} A=[Aλ In]y=[y0n]
This way we can get the solution as below.
x ∗ = ( A ′ T A ′ ) A ′ T y ′ = ( A T A + λ I ) − 1 A T y x^* = (A'^TA')A'^Ty' = (A^TA+ \lambda I)^{-1}A^Ty x=(ATA)ATy=(ATA+λI)1ATy

Convex Optimization

Equality constraints are allowed if they are affine.

Linear Programming (LP)

m i n   c T x + d : A x ≤ b min \ c^Tx+d:Ax \le b min cTx+d:Axb

Quadratic Programming (QP)

m i n   1 2 x T H x + c T x + d : A x ≤ b min \ \frac 1 2 x^THx+c^Tx + d: Ax \le b min 21xTHx+cTx+d:Axb

H is a PSD matrix.

If H is PD, then
f ( x ) = 1 2 ( x − x ∗ ) T H ( x − x ∗ ) + d − 1 2 x ∗ T H x ∗ x ∗ = − H − 1 c f(x) = \frac 1 2 (x-x^*)^T H (x-x^*) + d - \frac 1 2 x^{*T} H x^* \\ x^* = -H^{-1}c f(x)=21(xx)TH(xx)+d21xTHxx=H1c
If H is PSD, and c ∈ R ( H ) c \in R(H) cR(H), then
H x ∗ + c = 0 Hx^* + c = 0 Hx+c=0
Otherwise, the problem is unbounded.

Quadratic Constrained Quadratic Programming (QCQP)

m i n   1 2 x T Q 0 x + a 0 T x : x T Q i x + a i T x ≤ b i min \ \frac 1 2 x^T Q_0 x + a_0^Tx: x^T Q_i x + a_i^T x\le b_i min 21xTQ0x+a0Tx:xTQix+aiTxbi

Second-Order Cone Programming (SOCP)

m i n   c T x : ∣ ∣ A i x + b i ∣ ∣ 2 ≤ c i T x + d i min \ c^Tx: ||A_ix+b_i||_2 \le c_i^T x + d_i min cTx:Aix+bi2ciTx+di

Robust Programming

m i n x m a x u ∈ U f 0 ( x , u ) : f i ( x , u ) ≤ 0 , ∀ u ∈ U min_x max_{u \in U} f_0(x,u): f_i(x, u) \le 0, \forall u \in U minxmaxuUf0(x,u):fi(x,u)0,uU

Consider a single inequality with uncertain coefficient vector.
a T x ≤ b , a ∈ U a^Tx \le b, a \in U aTxb,aU

Scenario Uncertainty

U is finite.
m a x a ∈ U a T x ≤ b max_{a \in U} a^Tx \le b maxaUaTxb

Box Uncertainty

U = { a : ∣ ∣ a − a ^ ∣ ∣ ∞ ≤ ρ } = { a : a ^ + ρ u : ∣ ∣ u ∣ ∣ ∞ ≤ 1 } m a x a ∈ U a T x = a ^ T x + ρ ∣ ∣ x ∣ ∣ 1 U = \{a: ||a-\hat a||_{\infty} \le \rho \} = \{a: \hat a + \rho u: ||u||_{\infty} \le 1 \} \\ max_{a \in U} a^Tx = \hat a^T x + \rho||x||_1 U={ a:aa^ρ}={ a:a^+ρu:u1}maxaUaTx=a^Tx+ρx1

Spherical Uncertainty

U = { a : ∣ ∣ a − a ^ ∣ ∣ 2 ≤ ρ } m a x a ∈ U a T x = a ^ T x + ρ ∣ ∣ x ∣ ∣ 2 U = \{a: ||a-\hat a||_{2} \le \rho \} \\ max_{a \in U} a^Tx = \hat a^T x + \rho||x||_2 U={ a:aa^2ρ}maxaUaTx=a^Tx+ρx2

Ellipsoidal Uncertainty

U = { a : ( a − a ^ ) T P − 1 ( a − a ^ ) ≤ 1 } m a x a ∈ U a T x = a ^ T x + ∣ ∣ R T x ∣ ∣ 2 , P = R T R U = \{a: (a-\hat a)^T P^{-1} (a-\hat a) \le 1 \} \\ max_{a \in U} a^Tx = \hat a^T x + ||R^Tx||_2, P = R^TR U={ a:(aa^)TP1(aa^)1}maxaUaTx=a^Tx+RTx2,P=RTR

Convexity

A subset is said to be convex if it contains the line segment between any two points in it.
x 1 , x 2 ∈ C , λ ∈ [ 0 , 1 ] ⇒ λ x 1 + ( 1 − λ ) x 2 ∈ C x_1, x_2 \in C, \lambda \in [0,1] \Rightarrow \lambda x_1 + (1- \lambda) x_2 \in C x1,x2C,λ[0,1]λx1+(1λ)x2C
A function f is convex if its domain and range are convex. Convex functions must be + ∞ + \infty + outside their domains.

The epigraph of a function is
e p i f = { ( x , t ) , x ∈ d o m f , t ∈ R : f ( x ) ≤ t } epi f = \{(x,t), x \in dom f, t \in R: f(x) \le t \} epif={ (x,t),xdomf,tR:f(x)t}
f is a convex function iff epi f is a convex set.

Optimality

Consider a problem
m i n x f o ( x ) : A x = b min_x f_o(x): Ax=b minxfo(x):Ax=b
We have
∇ f 0 ( x ∗ ) T ( x − x ∗ ) ≥ 0 , ∀ x ∈ X \nabla f_0(x^*)^T(x-x^*) \ge 0, \forall x \in \mathcal X f0(x)T(xx)0,xX
The proof comes from
f 0 ( x ) ≥ f 0 ( x ∗ ) + ∇ f 0 ( x ∗ ) T ( x − x ∗ ) f_0(x) \ge f_0(x^*) + \nabla f_0(x^*)^T(x-x^*) f0(x)f0(x)+f0(x)T(xx)
In a convex unconstrained problem with differentiable objective, x is optimal iff ∇ f 0 ( x ) = 0 \nabla f_0(x) = 0 f0(x)=0. If there is an equality constraint of A x = b Ax=b Ax=b, then x is optimal iff A x = b , ∃ v : ∇ f 0 ( x ) + A T v = 0 Ax=b , \exist v: \nabla f_0(x) + A^Tv = 0 Ax=b,v:f0(x)+ATv=0.

Hulls

Given a set of points P P P in R n R^n Rn.

Linear Hull

x = ∑ i = 1 m λ i x i x = \sum_{i=1}^m \lambda_ix_i x=i=1mλixi

Affine Hull

x = ∑ i = 1 m λ i x i : ∑ i = 1 m λ i = 1 x = \sum_{i=1}^m \lambda_ix_i: \sum_{i=1}^m \lambda_i=1 x=i=1mλixi:i=1mλi=1

aff P is the smallest affine set containing P P P.

Conic Hull

x = ∑ i = 1 m λ i x i : λ i ≥ 0 x = \sum_{i=1}^m \lambda_ix_i: \lambda_i \ge 0 x=i=1mλixi:λi0

Convex Hull

x = ∑ i = 1 m λ i x i : ∑ i = 1 m λ i = 1 , λ i ≥ 0 x = \sum_{i=1}^m \lambda_ix_i: \sum_{i=1}^m \lambda_i=1, \lambda_i \ge 0 x=i=1mλixi:i=1mλi=1,λi0

Preserving Convexity

Intersection Rule

The intersection of convex sets is also a convex set, and it holds for infinite families of convex sets. The intersection of halfspaces is also called a polyhedron.

Affine Transformation

An affine mapping of a convex set is still convex.

Pointwise Maximization

If ( f a ) a ∈ A (f_a)_{a \in A} (fa)aA is a family of convex functions, and A is a set (not necessarily a convex set), then the pointwise max function is convex.
f ( x ) = m a x a ∈ A f a f(x) = max_{a \in A} f_a f(x)=maxaAfa

Partial Minimization

If g(x, y) is convex in x, y and C is convex, then f ( x ) = m i n y ∈ C g ( x , y ) f(x) = min_{y \in C} g(x, y) f(x)=minyCg(x,y) is convex. This result trivially extends to partial minimization over a subset of the function’s arguments.

Composition Function

x → f ( g ( x ) ) x \rightarrow f(g(x)) xf(g(x))

If f is convex and increasing, and g is convex, then the function is convex w.r.t to x.

Constraints

Activeness

If x ∗ x^* x satisfies f i ( x ∗ ) < 0 f_i(x^*) < 0 fi(x)<0, then the i-th inequality constraint is inactive (slack) at the optimal solution x ∗ x^* x.

Problem Transformations

An optimization problem can be transformed into an equivalent one

  • monotone transformation (scaling, logarithm, squaring)
  • change of variables
  • addition of slack variables
  • epigraphic reformulation
  • replacement of equality constraints with inequality ones
  • elimination of inactive constraints (safe feature elimination)
  • discovering hidden convexity

Duality

Problem Formulation

Consider an optimization problem in standard form
p ∗ = m i n x f 0 ( x ) s . t . : f i ( x ) ≤ 0 , i = 1 ∼ m h i ( x ) = 0 , i = 1 ∼ q p^* = min_x f_0(x) \\ s.t.: f_i(x) \le 0, i=1 \sim m \\ h_i(x) = 0, i=1 \sim q p=minxf0(x)s.t.:fi(x)0,i=1mhi(x)=0,i=1q
Note that the objective function and the constraints are not necessarily convex.

Lagrangian

Vectors λ \lambda λ and v v v are referred to as Lagrange multipliers.
L ( x , λ , v ) = f o ( x ) + ∑ i = 1 m λ i f i ( x ) + ∑ i = 1 q v i h i ( x ) \mathcal L(x, \lambda, v) = f_o(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^q v_i h_i(x) L(x,λ,v)=fo(x)+i=1mλifi(x)+i=1qvihi(x)
The primal can be expressed as
p ∗ = m i n x m a x λ ≥ 0 , v L ( x , λ , v ) p^* = min_x max_{\lambda \ge 0, v} \mathcal L(x, \lambda, v) p=minxmaxλ0,vL(x,λ,v)

Recovering Primal Solutions

If L ( x , λ ∗ , v ∗ ) \mathcal L(x, \lambda^*, v^*) L(x,λ,v) has an unique minimizer, then it is either primal-optimal solution or there is no such solution if it is not primal-feasible.

Weak Duality

The Minimax Inequality is
p ∗ = m i n x m a x y F ( x , y ) ≥ m a x y m i n x F ( x , y ) = d ∗ p^* = min_x max_y F(x,y) \ge max_y min_x F(x,y) = d^* p=minxmaxyF(x,y)maxyminxF(x,y)=d
Therefore, the weak duality indicates that
g ( λ , v ) = m i n x L ( x , λ , v ) d ∗ = m a x λ ≥ 0 , v g ( λ , v ) ≤ p ∗ g(\lambda, v) = min_x \mathcal L(x, \lambda, v) \\ d^* = max_{\lambda \ge 0, v} g(\lambda, v) \le p^* g(λ,v)=minxL(x,λ,v)d=maxλ0,vg(λ,v)p

Strong Duality

The strong duality is achieved when p ∗ = d ∗ p^* = d^* p=d.

Sion’s Minimax Theorem

Let X be convex and Y be a compact set (bounded and closed). If F(x,y) is convex over X and concave over Y, then
m i n x m a x y F ( x , y ) = m a x y m i n x F ( x , y ) min_x max_y F(x,y) = max_y min_x F(x,y) minxmaxyF(x,y)=maxyminxF(x,y)

Slater’s Condition

If the problem is strictly feasible, then strong duality holds. Namely, there exist x 0 ∈ r e l i n t D x_0 \in relint D x0relintD such that f i ( x 0 ) < 0 f_i(x_0) < 0 fi(x0)<0.

Karush-Kuhn-Tucker Condition (KKT)

Strong duality holds iff the KKT conditions are satisfied.

Primal feasibility: f i ( x ) ≤ 0 f_i(x) \le 0 fi(x)0

Dual feasibility: λ ≥ 0 \lambda \ge 0 λ0

Complementary slackness: λ i f i ( x ) = 0 \lambda_i f_i(x) = 0 λifi(x)=0

Lagrangian stationarity: ∇ x L ( x , λ ) = ∇ x f 0 ( x ) + ∑ i = 1 m λ i ∇ x f i ( x ) = 0 \nabla_x \mathcal L(x, \lambda) = \nabla _x f_0(x) + \sum_{i=1}^m \lambda_i \nabla _x f_i(x) = 0 xL(x,λ)=xf0(x)+i=1mλixfi(x)=0

Reference

  • Optimization Models in Engineering (EECS 227 A), Laurent El Ghaoui, University of California Berkeley, Fall 2021

猜你喜欢

转载自blog.csdn.net/qq_40136685/article/details/121760250
今日推荐