拉格朗日松弛(一)——理论及算法

背景

Marshall L.Fisher在1981年发表在《Management Science》上的《The Lagrangian Relaxation Method for Solving Integer Programming Problems》1在2004年被评为“《Management Science》首个50年中十大最具影响力的主题之一”。

在中国知网中,以“拉格朗日松弛”作为关键词在“篇关摘”选项下检索,并对结果进行计量可视化分析可得到如下的图表。

Visual analysis of China National Knowledge Network's publication quantity measurement从发文量这一指标可以看到,近10年涉及这一方法的论文显著增多。

Distribution of main topics of CNKI
Distribution of secondary topics of CNKI
从主要主题分布、次要主题分布中可以看到,在理论层面,该方法与动态规划、整数规划、凸优化和启发式算法等知识都有较为紧密的联系。在应用层面,该方法在机组组合、资源分配、调度、供应链和设施选址等领域都有应用。

拉格朗日松弛——理论

理论和算法部分内容参考资料2进行介绍,主要介绍拉格朗日松弛的思想、重要结论和拉格朗日松弛算法,结论的证明和更多内容可参考该书。除此之外,本文也吸纳了网上其他相关资料,帮助读者理解。

松弛的意思即为放松约束,对于一个标准化为求最小值的优化问题,放松约束会使得有可能得到目标函数值更小的解,换言之,松弛可以求得原问题的一个下界,这为评价其他算法的有效性提供了一种途径,也为原问题的求解提供了更多信息。

该方法常用在整数规划和混合整数规划中。

松弛方法主要有以下四种:线性规划松弛(将整数约束松弛至实数约束)、对偶规划松弛(求解对偶规划,根据弱对偶定理,求 max \text{max} The dual problem of max provides to findmin \text{min}A lower bound on the original problem of min ), surrogate relaxation (reducing the number of constraints by combining multiple constraints, such as summation), and Lagrangian relaxation. This blog post introduces Lagrangian relaxation.

The basic principle of the Lagrangian relaxation method : absorb the constraints that make the problem difficult into the objective function, and keep the objective function linear, so that the transformed problem can be solved in polynomial time or although it cannot be solved in polynomial time, due to the scale 3 is small and can be solved quickly , thus providing help for the solution of the original problem.
Problem Description

As shown in Figure 4 above (the thin line red box should be " ≥ \ge " and "≤ \le ", the thick red box should be "lower", "lower bound" means the lower bound), assuming that the constraints of a problem can be divided into two parts, that is,AAA andDDD D 1 D_1 D1 D 2 D_2 D2 D 3 D_3 D3), their difference is DDD is only related to a certain part of the variables, whileAAA is related to all variables, thenAAA is the above-mentioned "constraints that make the problem difficult" (also known as complex constraints). Lagrangian relaxation is to put this complex constraint into the objective function. Aboutλ \lambdaWhy λ is required to be non-negative, please refer to5. Figure 6belowcan help to understand.

Understanding of the objective function after relaxation
Further, the objective function value z LR after Lagrange relaxation is known z_{LR}zLRis the original problem objective function value zzA lower bound of z , since it needs to be as close as possible to the objective function value of the original problem, it can be used for the relaxed problemz LR z_{LR}zLRAbout λ λFind the maximum value of λ (constitute a dual problem, Dual Problem), and get the best value of all lower boundsz LD z_{LD}zLD(The red box in Figure 4 below should be z LD z_{LD}zLD).
From Lagrangian relaxation to solving dual problems
The use of Lagrangian relaxation is as follows6 .
Lagrangian relaxation uses
The above is the basic theory of Lagrangian relaxation, and some common deformations are introduced below.

The first is relaxation of equality constraints . This is exactly the same as "the dual variable corresponding to the equality constraint in the linear programming dual problem has no sign restriction" and "the coefficient corresponding to the equality constraint in the Kuhn-Tucker condition (KKT condition) has no sign restriction". The equality constraint relaxation corresponds to The coefficients of are also unsigned (subtraction of two non-negative coefficients yields a single unsigned coefficient).

The second is the Lagrangian decomposition . Looking back at question 4 above , we might as well set n = 3 n=3n=3 , consistent with the schematic diagram below. Then the xx
Problem Description
in the relaxed problem objective functionx according tox 1 x_1x1 x 2 x_2 x2and x 3 x_3x3 分解可以等价变换为如下的问题。
z L R ( λ ) = minimize c 1 T x 1 + c 2 T x 2 + c 3 T x 3 + λ T ( A 1 x 1 + A 2 x 2 + A 3 x 3 − b ) subject to D 1 T x 1 ≤ e 1 D 2 T x 2 ≤ e 2 D 3 T x 3 ≤ e 3 x 1 , x 2 , x 3 ∈ Z + n \begin{array}{ll} z_{LR}(\lambda)=\text{minimize} & c_1^T x_1 +c_2^T x_2+c_3^T x_3+\lambda^T(A_1x_1+A_2x_2+A_3x_3-b) \\ \text{subject to} & D_1^T x_1 \le e_1 \\ & D_2^T x_2 \le e_2 \\ & D_3^T x_3 \le e_3 \\ & x_1, x_2, x_3 \in \Bbb{Z^n_+} \end{array} zLR( l )=minimizesubject toc1Tx1+c2Tx2+c3Tx3+lT(A1x1+A2x2+A3x3b)D1Tx1e1D2Tx2e2D3Tx3e3x1,x2,x3Z+nFurther, the above problem can be according to x 1 x_1x1 x 2 x_2 x2and x 3 x_3x3Decomposed into three sub-problems, since the objective function and constraints are all about x 1 x_1x1 x 2 x_2 x2and x 3 x_3x3separable, so this step is an equivalent transformation.

子问题一:
z L R 1 ( λ ) = minimize c 1 T x 1 + λ T ( A 1 x 1 − b ) subject to D 1 T x 1 ≤ e 1 x 1 ∈ Z + n \begin{array}{ll} z_{LR1}(\lambda)=\text{minimize} & c_1^T x_1+\lambda^T(A_1x_1-b) \\ \text{subject to} & D_1^T x_1 \le e_1 \\ & x _1\in \Bbb{Z^n_+} \end{array} zLR1( l )=minimizesubject toc1Tx1+lT(A1x1b)D1Tx1e1x1Z+n子问题二:
z L R 2 ( λ ) = minimize c 2 T x 2 + λ T A 2 x 2 subject to D 2 T x 2 ≤ e 2 x 2 ∈ Z + n \begin{array}{ll} z_{LR2}(\lambda)=\text{minimize} & c_2^T x_2+\lambda^TA_2x_2 \\ \text{subject to} & D_2^T x_2 \le e_2 \\ & x _2\in \Bbb{Z^n_+} \end{array} zLR2(λ)=minimizesubject toc2Tx2+λTA2x2D2Tx2e2x2Z+n子问题三:
z L R 3 ( λ ) = minimize c 3 T x 3 + λ T A 3 x 3 subject to D 3 T x 3 ≤ e 3 x 3 ∈ Z + n \begin{array}{ll} z_{LR3}(\lambda)=\text{minimize} & c_3^T x_3+\lambda^TA_3x_3 \\ \text{subject to} & D_3^T x_3 \le e_3 \\ & x _3\in \Bbb{Z^n_+} \end{array} zLR3(λ)=minimizesubject toc3Tx3+λTA3x3D3Tx3e3x3Z+nThe objective function value of the decomposed sub-problem has the following relationship with the original problem objective function value.
z LR 1 ( λ ) + z LR 2 ( λ ) + z LR 3 ( λ ) = z LR ( λ ) ≤ z z_{LR1}(\lambda)+z_{LR2}(\lambda)+z_{LR3} (\lambda)=z_{LR}(\lambda) \le zzLR1( l )+zLR2( l )+zL R 3( l )=zLR( l )The z- dual problem is as follows.
z LD = max z LR 1 ( λ ) + z LR 2 ( λ ) + z LR 3 ( λ ) z_{LD}= \text{max}\space z_{LR1}(\lambda)+z_{LR2}( \lambda)+z_{LR3}(\lambda)zLD=max zLR1( l )+zLR2( l )+zL R 3( λ ) It can be seen thaton the basis of Lagrangian relaxation, Lagrangian decomposition splits the original problem into sub-problems without complex constraints, so that the original problem can be solved by solving easier sub-problems approximate.

Lagrangian relaxation - algorithm

The Lagrangian relaxation algorithm consists of two parts: on the one hand, it provides the lower bound of the original problem (solving z LD z_{LD}zLD), on the other hand, it evolves into a Lagrangian relaxation heuristic algorithm.

subgradient optimization algorithm

A subgradient is very similar to a gradient, a subgradient is an extension of the gradient when the function is not differentiable at some point. The subgradient is introduced because z LR ( λ ) z_{LR}(\lambda)zLR( λ ) is a function aboutλ \lambdaThe piecewise linear function of λ , which will be explained later.

According to the numbers in 2 , the relevant definitions and theorems are introduced as follows, and the detailed proof can be referred to.

Definition 7.4.1 Function g : R m → R g:\Bbb{R^m}\to\Bbb{R}g:RmR 满足 g ( α x 1 + ( 1 − α ) x 2 ) ≥ α g ( x 1 ) + ( 1 − α ) g ( x 2 ) , ∀ x 1 , x 2 ∈ R m , 0 ≤ α ≤ 1 \begin{array}{ll} g(\alpha x^1+(1-\alpha)x^2)\ge\alpha g(x^1)+(1-\alpha)g(x^2) , \forall x^1, x^2\in \Bbb{R^m}, 0 \le \alpha \le1 \end{array} g(αx1+(1a ) x2)αg(x1)+(1a ) g ( x2),x1,x2Rm,0a1Then say g ( x ) g(x)g ( x ) is a concave function.

Theorem 7.4.1 If the problem after Lagrangian relaxation ( z LR z_{LR}zLR) set of feasible solutions QQQ is a set of finite integer points, then the objective function z LR ( λ ) = min c T x + λ T ( A x − b ) \begin{array}{ll} z_{LR}(\lambda)=\text {min}\space c^T x+\lambda^T(Ax-b) \end{array}zLR( l )=min c Tx+lT(Axb)is a concave function. (proved by definition)

Theorem 7.4.2 Differentiable function g ( x ) g(x)The necessary and sufficient condition for g ( x ) to be a concave function is ∀ x ∗ ∈ R m \forall x^*\in\Bbb{R^m}xRm,存在 s = ( s 1 , s 2 , … , s m ) T ∈ R m s=(s_1,s_2,…,s_m)^T\in\Bbb{R^m} s=(s1,s2,,sm)TRm使得 g ( x ∗ ) + s T ( x − x ∗ ) ≥ g ( x ) , ∀ x ∈ R m . \begin{array}{ll} g(x^*)+s^T(x-x^*)\ge g(x), \forall x\in\Bbb{R^m}. \end{array} g(x)+sT(xx)g(x),xRm.(analogous to the first-order conditional understanding of convex functions)

Figure 1 below is a schematic diagram of the conclusion of Theorem 7.4.1 ( uu in the figureu meansλ \lambdaλ ), withλ \lambdaThe variation of λ ,z LR ( λ ) z_{LR}(\lambda)zLR( λ ) at different pointsxxx x 1 x^1 x1 x 2 x^2 x2 x 3 x^3 x3 x 4 x^4 x4 ) reaches the minimum value, and whenxxWhen x is constant,z LR ( λ ) z_{LR}(\lambda)zLR( λ ) aboutλ \lambdaλ is a linear function, so in summary,z LR ( λ ) z_{LR}(\lambda)zLR( λ ) aboutλ \lambdaλ is a piecewise linear function, which is also a concave function. Each straight line represents the minimum value ofxxPoint x remains the same.
Segmented Linear Diagram

Definition 7.4.2 If the function g : R m → R g:\Bbb{R^m}\to\Bbb{R}g:RmR is a concave function, and at the pointx ∗ ∈ R mx^*\in\Bbb{R^m}xRAt m , the vectors ∈ R ms\in\Bbb{R^m}sRm 满足 g ( x ∗ ) + s T ( x − x ∗ ) ≥ g ( x ) , ∀ x ∈ R m \begin{array}{ll} g(x^*)+s^T(x-x^*)\ge g(x) , \forall x\in \Bbb{R^m} \end{array} g(x)+sT(xx)g(x),xRmThen say s ∈ R ms\in\Bbb{R^m}sRm is the functiong ( x ) g(x)g ( x ) atx ∗ x^*xA subgradient at ∗ . g ( x ) g(x)g ( x ) atx ∗ x^*xThe set of all subgradients at is recorded as ∂ g ( x ∗ ) \partial g(x^*)g(x ). (Intuitively, in the case of a one-variable function, the subgradient can be a function atx ∗ x^*xAny value between the left and right derivatives at ∗ . )

Theorem 7.4.3If g ( x ) g(x)g ( x ) is a concave function,x ∗ x^*x max  { g ( x ) ∣ x ∈   R m } \text{max}\space\{g(x)|x\in\ \Bbb{R^m}\} max { g(x)x RThe sufficient and necessary condition for the optimal solution of m }0 ∈ ∂ g ( x ∗ ) 0\in\partial g(x^*)0g(x)

subgradient optimization algorithm

  1. STEP1: Choose an initial Lagrangian multiplier λ 1 \lambda^1l1 , juxtaposet = 1 t=1t=1.
  2. STEP2: λ t \lambda^t for this iterationlt , solve the minimization problem in the inner layer of the dual problem, and start from∂ z LR ( λ t ) \partial z_{LR}(\lambda^t)zLR( lt )choose a subgradientsts^tst ; ifst = 0 s^t=0st=0 , thenλ t \lambda^tlt reaches the optimal solution and stops the calculation; otherwiseλ t + 1 = max { λ t + θ tst , 0 } \lambda^{t+1}=\text{max}\space\{\lambda^{t}+ \theta_ts^t, 0 \}lt+1=max { min  t+itst,0 } , sett = t + 1 t=t+1t=t+1 , repeat STEP2.

It is impossible to iterate infinitely many times in practical applications, so θ t \theta_t must be givenitThe determination method and the condition under which the algorithm stops.

  • θ t \theta_t itDetermination method
    The purpose of the actual calculation is to obtain an acceptable lower bound as soon as possible, so a heuristic method is often used for determination. There are mainly two types of methods.
    The first type of method is to make θ t \theta_titDecreases at an exponential rate with fewer iterations. The update formula is as follows.
    θ t = θ 0 ρ t , 0 < ρ < 1 \theta_t=\theta_0\rho^t, 0\lt\rho\lt1it=i0rt,0<r<1 The main idea of ​​the second type of method is to use the difference between the upper and lower bounds of the dual problem to correctθ t \theta_titspeed of change. The update formula is as follows.
    θ t = z UP ( t ) − z LB ( t ) ∣ ∣ st ∣ ∣ 2 β t \theta_t=\frac{z_{UP}(t)-z_{LB}(t)}{||s^t ||^2}\beta_tit=st2zUP(t)zLB(t)btwhere 0 ≤ β t ≤ 2 0\le\beta_t\le20bt2 , generally takeβ 0 = 2 \beta_0=2b0=2。当 z L R ( λ ) z_{LR}(\lambda) zLR( λ ) rises,β t \beta_tbtInvariant, when z LR ( λ ) z_{LR}(\lambda)zLR( λ ) When there is no change in a given number of steps, thenβ t \beta_tbt 减半。 z U P ( t ) z_{UP}(t) zUP( t ) is an upper bound of the optimal objective value of the original problem (and thus the upper bound of the dual problem), which can be determined or estimated by a feasible solution. z UP ( t ) z_{UP}(t)zUP( t ) can be changed withttThe change of t is gradually corrected. z LB ( t ) z_{LB}(t)zLB(t) z L R ( λ t ) z_{LR}(\lambda^t) zLR( lt ), generally takez LB ( t ) = z LR ( λ t ) z_{LB}(t)=z_{LR}(\lambda^t)zLB(t)=zLR( lt ), and sometimes only take a fixed value for simplicity of calculation.
  • Conditions for Algorithm Stop
    There are four main conditions for algorithm stop.
    The first is that the number of iterations does not exceed TTT. _ This is the simplest principle, and it is easy to control the complexity of the calculation, but the quality of the solution cannot be guaranteed.
    The second is∣ ∣ st ∣ ∣ ≤ ϵ ||s^t||\le\epsilonstϵ (ϵ \epsilonϵ is a small positive number), this is the ideal statest = 0 s^t=0st=An approximation of 0 .
    The third isz UP ( t ) = z LB ( t ) z_{UP}(t)=z_{LB}(t)zUP(t)=zLB( t ) , which indicates that the optimal solution to the original problem has been found,z = z UP ( t ) = z LB ( t ) z=z_{UP}(t)=z_{LB}(t)z=zUP(t)=zLB( t ) .
    The fourth type isλ t \lambda^tlt z L R ( λ t ) z_{LR}(\lambda^t) zLR( lt )does not change more than a given value within the specified number of steps, at this time it can be considered that the target value will not change, so the calculation is stopped.
    In a specific application, one of the above stop conditions can be used, or a combination of them can be used.

Lagrangian relaxation heuristic algorithm

There is such a conclusion 4 in the content mentioned above (the red box should be z LD z_{LD}zLD): Comparing the original problem on the left and the dual problem on the right, it can be found that the feasible region is expanded when solving the dual problem , so the better solution we get when solving the dual problem (given by the subgradient optimization algorithm) may not be The feasible solution of the original problem. At this time, the heuristic method is usually used to modify the solution to the feasible solution of the original problem according to the characteristics of the problem. The whole process constitutes the Lagrangian relaxation heuristic algorithm.
Comparison of primal and dual problems

References


  1. Fisher, M.L. 1981. The Lagrangian Relaxation Method for Solving Integer Programming Problems. Management Science. 27(1) 1-18. ↩︎ ↩︎

  2. Xing Wenxun, Xie Jinxing. Modern Optimal Calculation Methods (Second Edition) [M]. Tsinghua University Press, 2005:210-242. ↩︎ ↩︎

  3. Wang Yuan. Analysis of Integer Programming by Lagrangian Relaxation (with Python Code Example) ↩︎

  4. TransNET. Lagrangian relaxation problem ↩︎ ↩︎ ↩︎ ↩︎

  5. I would like to ask a question, why is the multiplier of the Lagrange multiplier method greater than 0? ↩︎

  6. Lagrange relaxation PPT ↩︎ ↩︎

Guess you like

Origin blog.csdn.net/CloudInSky1/article/details/122297915