first-order condition
Article directory
This section discusses the first-order condition (KKT) in detail,
Sequential Feasible Direction
Definition Let x ′ x'x′ is a feasible point,{ x ( k ) } \{x^{(k)}\}{ x( k ) }is a feasible sequence, satisfyingx ( k ) → x ′ x^{(k)} \to x'x(k)→x′ , and∀ k , x ( k ) ≠ x ′ \forall k, x^{(k)} \neq x'∀k,x(k)=x′ , thenx ( k ) − x ′ x^{(k)} - x'x(k)−x' forx' x'x′ , expressed as
x ( k ) − x ′ = δ kp ( k ) x^{(k)} - x' = \delta_k p^{(k)}x(k)−x′=dkp( k )
where $\delta_k > 0 $ andδ k → 0 \delta_k \to 0dk→0 , thenp ( k ) p^{(k)}p( k ) is a vector of fixed length, calledp ( k ) p^{(k)}pAny gathering pointpp of ( k )p is the feasible regionΩ \OmegaΩ atx ' x'x′ , the feasible directions of the sequence at all are denoted asF ′ \mathcal{F}'F′。
By definition, a sequence feasible direction determines a feasible sequence.
For example, the feasible point x ′ = 0 x' =0x′=0 , forΩ 1 = { x ∈ R 2 : x 2 ≥ x 1 2 } \Omega_1 = \{x\in \mathbb{R}^2:x_2\geq x_1^2\}Oh1={ x∈R2:x2≥x12} , then its sequence feasible direction needs to satisfyp 2 > 0 p_2 > 0p2>0 is fine.
And for Ω 2 = { x ∈ R 2 : x 2 = x 1 2 } \Omega_2 = \{x\in \mathbb{R}^2:x_2 = x_1^2\}Oh2={
x∈R2:x2=x12} , the feasible direction of the sequence needs to satisfy
p 1 ≠ 0 , p 2 = 0 p_1 \neq 0, p_2 = 0p1=0,p2=0
first-order necessary condition
May wish to set fff at the feasible pointx ′ x’xThe set of descending directions at ′
is D ′ = { p ∈ R n ∣ p T g ′ < 0 } D' = \{p\in \mathbb{R}^n|p^Tg' < 0\}D′={
p∈Rn∣pTg′<0}
Lemma Let x ∗ x^*x∗ is a local minimum point of the constrained problem, then atxxThere is no sequence feasible direction at x ^* is the descending direction, that is,
F ∗ ∩ D ∗ = ∅ \mathcal{F}^* \cap {D}^* = \varnothingF∗∩D∗=∅
In other words, the above lemma states that if x ∗ x^*x∗ is a local minimum point, then the objective function is atx ∗ x^*x∗ has a non-negative derivative along any feasible direction of the sequence.
Linearizable Feasible Direction
However, the set F ∗ \mathcal{F}^*F∗ is usually not easy to calculate, consider another feasible direction set which is easy to calculate. Constraint functionci c_iciat x ' x'x′ The first-order Taylor approximation at
ci ( x ′ + s ) ≈ ci ( x ′ ) + ∇ ci ( x ′ ) T s c_i(x' + s) \approx c_i(x') + \nabla c_i(x ')^T sci(x′+s)≈ci(x′)+∇ci(x′)T s
defines the pointx ' x'xThe feasible direction of linearizationat ′ is to satisfy
p T ai ′ = 0 , i ∈ E p T ai ′ ≤ 0 , i ∈ I \begin{aligned} p^T a_i' \textcolor{red}{=} 0, i \in\mathcal{E} \\ p^T a_i' \textcolor{blue}{\leq} 0, i\in\mathcal{I} \end{aligned}pT ai′=0,i∈EpT ai′≤0,i∈I
The nonzero vector pp ofp , the set of all linearized feasible directions isF ′ F'F′。
Obviously, if F ′ \mathcal{F}'F' andF 'F'F’ Same, it will be convenient.
Lemma F ′ ⊆ F ′ \mathcal{F}' \subseteq F'F′⊆F′
Prove that if p ∈ F ′ p \in \mathcal{F}'p∈F′ , then there exists a feasible sequence{ x ( k ) } \{x^{(k)}\}{
x( k ) }satisfy
x ( k ) − x ′ = δ kp ( k ) x^{(k)} - x' = \delta_k p^{(k)}x(k)−x′=dkp( k )
whereδ k → 0 , p ( k ) → p \delta_k \to 0, p^{(k)} \to pdk→0,p(k)→p . Expand the constraintci c_iciat x ' x'xThe Taylor function
ci ( x ( k ) ) = ci ( x ′ ) + δ kp ( k ) ai ′ + o ( δ k ) c_i(x^{(k)}) = c_i(x') + \delta_k p^{(k)} a_i' + o(\delta_k)ci(x(k))=ci(x′)+dkp(k)ai′+o ( dk)
optoi∈ E i \in \mathcal{E}i∈E,有 c i ( x ( k ) ) = c i ( x ′ ) = 0 c_i(x^{(k)}) = c_i(x') = 0 ci(x(k))=ci(x′)=0 ;op toi ∈ I i \in \mathcal{I}i∈I,有 c i ( x ( k ) ) ≤ c i ( x ′ ) = 0 c_i(x^{(k)}) \leq c_i(x') = 0 ci(x(k))≤ci(x′)=0 For example, give
ci ( x ( k ) ) δ k = ci ( x ′ ) δ k + p ( k ) ai ′ + o ( δ k ) δ k \frac{c_i(x^{(k)}) }{\delta_k} = \frac{c_i(x')}{\delta_k} + p^{(k)} a_i' + \frac{o(\delta_k)}{\delta_k}dkci(x(k))=dkci(x′)+p(k)ai′+dko ( dk)
For k → ∞ k \to \inftyk→∞, p ∈ F ′ p \in F' p∈F′ , the lemma is proved.
Unfortunately, F ' ⊆ F ′ F' \subseteq \mathcal{F}'F‘⊆F' Not necessarily true.
Example Define the set
Ω = { x ∈ R 2 : x 2 ≤ x 1 3 , x 2 ≥ 0 } \Omega = \{x\in\mathbb{R}^2:x_2 \leq x_1^3,x_2\geq 0 \}Oh={
x∈R2:x2≤x13,x2≥0 }
Consider the feasible pointx ′ = ( 0 , 0 ) T x' = (0,0)^Tx′=(0,0)T , linearized feasible directionp = ( − 1 , 0 ) T ∈ F ′ p = (-1,0)^T \in F'p=(−1,0)T∈F′ , obviously there is no feasible direction to converge toppp , iep ∉ F ′ p \notin \mathcal{F}'p∈/F′。
constraint specification
Constraint specification ( constraint quality, CQ ) refers to the guarantee that F ' = F ′ F' = \mathcal{F}'F‘=F’ assumptions. It should be noted that it israre that the constraint specification fails.
Lemma At feasible point x ′ x'x’ , if the condition
(1) LCQ: c i ( x ) , i ∈ A ′ c_i(x), i \in \mathcal{A}' ci(x),i∈A’ is a linear function, or
(2) LICQ: a i ′ , i ∈ A ′ a_i', i \in \mathcal{A}' ai′,i∈A' linearly independent
established, then there is F ' = F ′ F' = \mathcal{F}'F‘=F′。
Farkas Lemma
Farkas Lemma Given nnn- dimensional vectora 1 , a 2 , ⋯ , am a_1,a_2,\cdots,a_ma1,a2,⋯,amand ggg,集合
S = { p ∈ R n : p T g < 0 , p T a i ≤ 0 , i = 1 , 2 , ⋯ , m } S = \{ p \in \mathbb{R}^n: p^Tg<0,p^Ta_i \leq 0, i = 1,2,\cdots,m \} S={ p∈Rn:pTg<0,pT ai≤0,i=1,2,⋯,m }
is an empty setif and only ifthere existsλ i ≥ 0 , i = 1 , 2 , ⋯ , m \lambda_i \geq 0,i =1,2,\cdots,mli≥0,i=1,2,⋯,m,使得
− g = ∑ i = 1 m a i λ i -g = \sum_{i=1}^m a_i \lambda_i −g=i=1∑maili
Given R n \mathbb{R}^nRVectora 1 , a 2 , ⋯ , am a_1,a_2,\cdots,a_m in na1,a2,⋯,am,令
C = { v ∈ R n : v = ∑ i = 1 m a i λ i , λ i ≥ 0 } C = \left\{ v \in \mathbb{R}^n:v=\sum_{i=1}^m a_i \lambda_i,\lambda_i \geq 0 \right\} C={
v∈Rn:v=i=1∑maili,li≥0 }
thenCCC is apolygonal coneand is aclosed convex set,
If the vector a ∈ C a \in Ca∈C , then there exists a hyperplane separationCCCwaaa __a , that is, there exists a non-zero vectorppp 使得
p T a > 0 , p T v ≤ 0 , ∀ v ∈ C p^Ta > 0,p^Tv\leq 0,\forall v \in C pT a>0,pTv≤0,∀v∈C
To connect the necessary conditions of the lemma with the Lagrange multipliers, it is necessary to extend the Farkas lemma to the case of equality.
Corollary Given R n \mathbb{R}^nRn mediumg ∗ , ai ∗ , i ∈ E , ai ∗ , i ∈ I ∗ g^*,a_i^*,i\in \mathcal{E},a_i^*,i\in\mathcal{I} ^*g∗,ai∗,i∈E,ai∗,i∈I∗,则集合
S = { p ∈ R n : p T g ∗ < 0 , p T a i ∗ = 0 , i ∈ E , p T a i ∗ ≤ 0 , i ∈ I ∗ } S = \{ p \in \mathbb{R}^n: p^Tg^*<0, p^Ta_i^*= 0,i\in \mathcal{E},p^Ta_i^* \leq 0, i \in \mathcal{I}^*\} S={ p∈Rn:pTg∗<0,pT ai∗=0,i∈E,pT ai∗≤0,i∈I∗ }
is empty and existsλ i ∗ , i ∈ E , λ i ∗ ≥ 0 , i ∈ I ∗ \lambda_i^*,i\in \mathcal{E},\lambda_i^*\geq 0, i\in \mathcal{I}^*li∗,i∈E,li∗≥0,i∈I∗,使得
− g ∗ = ∑ i ∈ E λ i ∗ a i ∗ + ∑ i ∈ I ∗ λ i ∗ a i ∗ -g^* = \sum_{i\in\mathcal{E}} \lambda_i^*a_i^* + \sum_{i\in\mathcal{I}^*} \lambda_i^* a_i^* −g∗=i∈E∑li∗ai∗+i∈I∗∑li∗ai∗
The KKT condition can be proved by the above deduction.
Regularity assumption 1 : F ∗ ∩ D ∗ = F ∗ ∈ D ∗ F^*\cap D^* = \mathcal{F}^* \in D^*F∗∩D∗=F∗∈D∗
If x ∗ x^*x∗ is a local minimum point, and atx ∗ x^*xThe regularity assumption 1 at ∗ holds, then
F ∗ ∩ D ∗ = ∅ F^*\cap D^* = \varnothingF∗∩D∗=∅According
to Farkas lemma, we know that∃ λ i ∗ ∈ A ∗ \exists \lambda_i^* \in \mathcal{A}^*∃λi∗∈A∗ , andλ i ∗ ≥ 0 , i ∈ I ∗ \lambda_i^* \geq 0,i \in \mathcal{I}^*li∗≥0,i∈I∗,使得
g ∗ + ∑ i ∈ E λ i ∗ a i ∗ + ∑ i ∈ I ∗ λ i ∗ a i ∗ = 0 g^* + \sum_{i \in \mathcal{E}} \lambda_i^* a_i^* + \sum_{i\in\mathcal{I}^*}\lambda_i^*a_i^* = 0 g∗+i∈E∑li∗ai∗+i∈I∗∑li∗ai∗=0
当 i ∈ I \ I ∗ i \in \mathcal{I} \backslash \mathcal{I}^* i∈I\I∗ 时,有 c i ( x ∗ ) < 0 c_i(x^*) < 0 ci(x∗)<0 , thenλ i ∗ = 0 \lambda^*_i = 0li∗=0。
KKT conditions
Theorem (first-order necessary condition) If x ∗ x^*x∗ is a local minimum point andx ∗ x^*x∗ regularity assumption
F ∗ ∩ D ∗ = F ∗ ∩ D ∗ F^* \cap D^* = \mathcal{F}^* \cap {D}^*F∗∩D∗=F∗∩D∗
holds, then there is a Lagrange multiplierλ ∗ \lambda^*l∗ Makex ∗ , λ ∗ x^*,\lambda^*x∗,l∗ 满足
∇ x L ( x ∗ , λ ∗ ) = 0 c i ( x ∗ ) = 0 , i ∈ E c i ( x ∗ ) ≤ 0 , i ∈ I λ i ∗ ≥ 0 , i ∈ I λ i ∗ c i ( x ∗ ) = 0 , i ∈ I \begin{aligned} \nabla_x \mathcal{L}(x^*,\lambda^*) & = 0\\ c_i(x^*) & = 0, i \in \mathcal{E} \\ c_i(x^*) & \leq 0, i \in \mathcal{I} \\ \lambda_i^* &\geq 0, i \in \mathcal{I} \\ \lambda_i^*c_i(x^*) & = 0, i \in \mathcal{I} \end{aligned} ∇xL(x∗,l∗)ci(x∗)ci(x∗)li∗li∗ci(x∗)=0=0,i∈E≤0,i∈I≥0,i∈I=0,i∈I
The regularity assumption is that for the vector a ′ ( i ∈ A ′ ) a' ~ (i \in \mathcal{A}')a′ (i∈A’ )for further relaxation.
Theorem Let x ∗ x^*x∗ is a local minimum point of the constraint problem, and atx ∗ x^*x∗ where LCQ or LICQ is established, thenx ∗ x^*x∗ Satisfies KKT conditions..
Example Consider the problem
min x 2 s . t . x 2 ≤ x 1 3 , x 2 ≥ 0 (1) \begin{aligned} \min ~~&x_2\\ \mathrm{st}~~&x_2 \leq x_1^3 ,x_2\geq 0 \end{aligned} \tag{1}min s.t. x2x2≤x13,x2≥0(1)
与
min x 1 s . t . x 2 ≤ x 1 3 , x 2 ≥ 0 (2) \begin{aligned} \min ~~&x_1\\ \mathrm{s.t.}~~&x_2 \leq x_1^3,x_2\geq 0 \end{aligned} \tag{2} min s.t. x1x2≤x13,x2≥0( 2 )
In the solutionx ∗ = ( 0 , 0 ) T x^*=(0,0)^Tx∗=(0,0)At T , it is easy to verify that (1) the regularity assumption is satisfied, and (2) the regularity assumption is not satisfied.
references
[1] Liu Hongying, Xia Yong, Zhou Yongsheng. Fundamentals of Mathematical Programming, Beijing, 2012.