Lagrange duality (Lagrange duality)

1. Even the original problem to the problem of

Duality theory is an important part of the optimization, constrained optimization problem is a problem often encountered in machine learning, such problems can be expressed by the following form
\ [\ begin {aligned} min \; \; & f (x ) \\ st \; \; & g_i (x) \ le 0, \; \; i = 1, \ cdots, m \\ & h_i (x) = 0, \; \; i = 1, \ cdots, n \\ \ end {aligned} \
] constraints need to be solved to reduce the space, but in machine learning, and constraint conditions are often more complicated. Thus constraint condition to calculate the optimum value is recalculated inconvenient space constraints. Thus Lagrangian function of generalized constrained optimization problem into an unconstrained optimization problem
\ [L (x, \ lambda , \ eta) = f (x) + \ sum_i ^ m \ lambda_i g_i (x) + \ sum_i ^ n \ eta_i h_i (x)
\] At this time, if in accordance with the method of Lagrange multipliers directly \ (x, \ lambda, \ eta \) partial derivative, the result of no benefit to simplify complex constraint. We want to get a way to optimize the original problem, but also simplify the method of calculation. So to further tap \ (\ lambda, \ eta \ ) things can bring, as we do on the generalized Lagrange function \ (\ lambda, \ eta \ ) when maximized
\ [\ theta_P (x) = \ underset {\ lambda \ ge 0
, \ eta} {max} \; L (x, \ lambda, \ eta) \] wherein requires \ (\ the lambda \ GE 0 \) , it is easy to find in this maximization question, if \ (x \)Does not meet the constraints of the original problem, then the result must be to maximize the positive infinity. For example, \ (G_i (the X-)> 0 \) , in regard to \ (\ lambda, \ eta \ ) is maximized, it will factor tends to infinity makes the whole formula tends to infinity. When \ (x \) is satisfied constraints, to maximize results must be \ (f (the X-) \) . According to this feature, we can remove the original Lagrangian minimization problem is generalized two-step
\ [\ underset x {min} \; L (x, \ lambda, \ eta) = \ underset x {min } \; \ theta_P (x) = \ underset x {min} \; \ underset {\ lambda \ ge 0, \ eta} {max} \; L (x, \ lambda, \ eta) \]

Electrode L (x, \ lambda, \ eta) $ called generalized Lagrangian; problems after dismantling $ \ underset x {min}; \ underset {\ lambda \ ge 0, \ eta} {max} great small problem, and it is completely equivalent to the original problem. In duality, this problem is known as the original problem (Primal problem).

Minimax problem by original problem, which can lead to the dual problem (Dual problem), which is the dual problem minimax problem exchanging one position only. First, the definition of
\ [\ theta_D (\ lambda,
\ eta) = \ underset {x} {min} L (x, \ lambda, \ eta) \] then the dual problem is
\ [\ underset {\ lambda \ ge 0, \ eta} {max} \; \ theta_D (\ lambda, \ eta) = \ underset {\ lambda \ ge 0, \ eta} {max} \; \ underset {x} {min} L (x, \ lambda, \ ETA) \]
this minimax problem is generalized Lagrangian problem, which will expand as a constraint optimization problem
\ [\ underset {\ lambda, \ eta} {max} \; \ theta_D (\ lambda, \ eta) = \ underset {\ lambda, \ eta} {max} \; \ underset {x} {min} L (x, \ lambda, \ eta) \\ st \ lambda_i \ ge 0, \; \ ; i = 1,2, \ cdots,
k \] can be seen that the two functions are not the same variable, for the original problem, it is the variable \ (X \) , and for the dual problem, it is the variable \ ( \ the lambda, \; \ ETA \) . Moreover, these two issues are not equivalent, and sometimes even a little more than poor. Other countries can be understood as the most powerful table tennis players, nor China's most powerful food table tennis players, of course, this analogy is not accurate.

2. Weak and strong duality duality

Dual function can be understood as a function to find a lower bound of the original, the original function of the calculated difficulty, it can be obtained an approximate value by solving the dual function. And function when certain conditions are met, the solution of the solution of the original function is equivalent to even function. Specifically, the dual function \ (\ theta_D (\ lambda, \ eta) = \ underset {x} {min} L (x, \ lambda, \ eta) \) determining a lower bound of the original problem, i.e.
\ [\ theta_D (\ lambda, \ eta) = \ underset {x} {min} L (x, \ lambda, \ eta) \ le L (x, \ lambda, \ eta) \ le \ underset {\ lambda \ ge 0, \ eta} {max} \; L (x, \ lambda, \ eta) = \ theta_P (x) \ tag {2-a} \]

That
\ [\ theta_D (\ lambda,
\ eta) \ le \ theta_P (x) \] where, \ (\ theta_d (\ the lambda, \ ETA) \) seen in other countries table tennis players, \ (\ theta_P (the X- ) \) regarded as Chinese table tennis players, then the most powerful in other countries is not necessarily comparable to China the worst. That
\ [d ^ * = \ underset {\ lambda, \ eta} {max} \; \ theta_D (\ lambda, \ eta) \ le \ underset x {min} \; \ theta_P (x) = p ^ * \ tag {2-b} \]
this property is weak duality (weak Duality) . Weak duality is true for any optimization problem, which seems to be obvious, because the lower bound is not critical, and sometimes even get to the very small, the approximate solution of the original problem of not much help. Both weak duality, so there will be strong duality, strong duality refers to
\ [d ^ * = p ^
* \] This is obviously a cause for surprise nature, which means that by solving simpler dual problem (because of the dual problem is always a convex optimization problem) to obtain the solution of the original problem. But strong duality is a very difficult issue in optimization problems, especially for me. So I can only introduce conditions on two strong duality: the strict conditions and KKT conditions.

3. KKT conditions

Stringent conditions means that the original problem is a convex function, constraints are affine function, if inequality constraints at this time to meet the strict condition that inequality is strict inequality, can not take the equal sign, then strong duality holds. This condition becomes SVM ie any point on, there is a hyperplane can divide them correctly, i.e. the data set is linearly separable. Stringent conditions are sufficient conditions for strong duality, but not a necessary condition. Some are not satisfied stringent conditions may also have strong duality.

KKT condition in the case of meet strict conditions to derive the value of the variable relationship, assuming that the original problem and the extreme point of the dual problem are \ (x ^ * \) and \ (\ lambda ^ *, \ eta ^ * \) , corresponding to the extreme values are \ (p ^ * \) and \ (D * ^ \) . By satisfying strong duality, there \ (P = D ^ * ^ * \) . Get into the extreme point
\ [d ^ * = \ theta_D (\ lambda ^ *, \ eta ^ *) = \ underset x {min} L (x, \ lambda ^ *, \ eta ^ *) \ tag { 3-a} \]
this shows \ (x ^ * \) is \ (L (x, \ lambda ^ *, \ eta ^ *) \) an extreme point, then the \ (L (x, \ lambda ^ *, \ eta ^ *) \ ) in the (x ^ * \) \ gradient at zero, i.e.
\ [\ triangledown f (x ^ *) + \ sum_i ^ m \ lambda_i g_i (x ^ *) + \ sum_i ^ n \ eta_i h_i (x ^
*) = 0 \ tag {3-b} \] by the formula \ ((A-2) \) ,
\ [\ Begin {aligned} d ^ * = & \ underset x {min} L (x, \ lambda ^ *, \ eta ^ *) \\ \ le & L (x ^ *, \ lambda ^ *, \ eta ^ *) \\ = & f (x ^ *) + \ sum_i ^ m \ lambda_i g_i (x ^ *) + \ sum_i ^ n \ eta_i h_i (x ^ *) \\ \ le & p ^ * = f (x ^ *) \ end {aligned}
\ tag {3-c} \] Since \ (P ^ * = D ^ * \) , so the formula unequal number should take the equal sign, then the formula \ ((3-b) \) to give
\ [\ sum_i ^ m \ lambda_i
g_i (x ^ *) + \ sum_i ^ n \ eta_i h_i (x ^ *) = 0 \ tag {3-d} \] Since the note \ (x ^ * \) as a solution to this problem, it is sure to meet the \ (h (x ^ *) = 0 \) , so
\ [\ lambda_i g_i (x) = 0, \; \; \; i = 1,2, \ cdots, m \]
this is called the complementary slackness condition (complementary slackness).

Which, \ (\ the lambda \ GE 0 \) is called dual feasibility. And it seems to be from the original problem to the problem of the dual problem of Minimax summed up. However, here there is another explanation, to simplify it, consider only inequality constraints
\ [\ begin {aligned} min \; \; & f (x) \\ st \; \; & g (x) \ le 0 \ \ \ end {aligned} \]
where \ (g (x) \ le 0 \) called an original feasibility, it is determined by the interval referred feasible region. Assuming that \ (x ^ * \) solution for the problem, then its position in two cases

(. 1) \ (G (X ^ *) <0 \) , the solution made in the feasible region. Then the solution is called an internal solution, the constraint condition is not valid, the original problem becomes unconstrained problem.
(2) \ (G (X ^ *) = 0 \) , the solution on the boundary acquired, then solutions are called border solution, constraint conditions effective.

Internal solution directly from the gradient of 0 to solve for, here focused on a boundary solution.

For \ (g (x) = 0 \) the constraints established Lagrange function
\ [L (x, \ lambda
) = f (x) + \ lambda g (x) \] Since stagnation \ (X ^ * \) made thereon, then the function (X ^ * \) \ gradient at zero, i.e.
\ [\ triangledown f (x ^ *) + \ lambda \ triangledown g (x ^ *) = 0 \]
directions of the two gradients should be determined here, \ (f (x) \) is taken to a minimum value at the boundary, the portion of the feasible region \ (f (x) \) should be greater than this are extremely a small value, so \ (\ triangledown f \) direction are possible within the domain. And \ (\ triangledown g \) direction of the outside portion is feasible, because the constraints are \ (G (X) \ Le 0 \) , i.e. outside portion are feasible \ (G (X)> 0 \) , so gradient direction increases in the direction pointing function. This shows the gradient direction opposite to the two functions, that this equation to set up the above, \ (\ the lambda \) can only be 0 or greater. This is the dual feasibility.

Then combining the other conditions, we get the KKT condition:
\ [\ the aligned the begin {} \ triangledown the _x L (X ^ *, \ * the lambda ^, \ ^ ETA *) = 0 \\ G_i (X ^ *) \ le 0 \\ \ lambda_i \ ge 0 \\ \ lambda_i g_i (x ^ *) = 0 \ end {aligned} \]

Reference：

[1] Convex Optimization

[2] Pattern Recognition and Machine Learning.

[3] statistical learning methods

[4] Support Vector Machine: Duality

[5] KKT conditions