In-depth understanding and derivation of KKT conditions in machine learning SVM

In-depth understanding and derivation of KKT conditions in machine learning SVM

This article is aimed at readers who are looking for articles related to KKT conditions, and by default, the model and smo algorithm under the relaxation of svm have been understood. If you don’t have it or need to review it, please refer to the detailed derivation process of the support vector machine SVM and SMO algorithm . Although the article was written in the undergraduate period, it was rough, but it has been redone and revised before the publication of this article (although the interface is ugly), but you must be patient. Can understand. If you want to watch the derivation of the video, you can also watch the treasure I found, Mr. Dahai, the master of the up master , and read the machine learning column inside.

known so far

At present, we have deduced the model of the unrelaxed and relaxed Lagrangian algorithm, and obtained the iterative relationship of α2 according to the recurrence relationship as follows. The smo
insert image description here
algorithm is a heuristic algorithm, which is in all α Choose two iterations of the smo algorithm until the termination condition is met.
For the recursive iterations of two α, we have the following basic model [1], we write asfigure 1
insert image description here
Among them, min W(α1, α2) is the objective function, each 0=< αi <= C, C is the penalty factor (generally a fixed constant at the beginning)
and the constraint condition a1y1 + a2y2 = l, (for later Shorthand, α is equivalent to a, ceta is replaced by l), because only a1 and a2 are considered, so the other summation items are regarded as l, which is convenient for writing derivation.

I believe readers are strange about the kkt condition of SVM. Here is why the upper and lower bounds are taken. Where do the formulas of min and max come from?
insert image description here
insert image description here

First answer the first question, why do you need to take the upper and lower bounds: We know that when solving a problem with constraints, due to the solution of the heuristic algorithm, some constraints may be broken to make them unsatisfied. The norm is in the solution space, so the upper and lower bounds L and H here come like this.
The second question is answered below, which is also the focus of this article. Where do the parameters of the min and max functions of L and H come from?
First of all, when constraints are needed, α1 and α2 violate the KKT conditions. Only by knowing the KKT conditions can we deduce the violations and explain the formulas of min and max.

KKT conditions

The following are the sufficient and necessary relations of C-type optimization problems, LCQ, normative constraints, LICQ, KKT, Slater, etc. This article only discusses the problems in the upper right corner, but they are all pointed out here to give you a macro perspective. (This has nothing to do with solving the problems to be discussed, just to understand, it is written to make the article more complete and convincing)
insert image description here
Next,
insert image description here
the upper left corner of the main topic is a schematic diagram of linear SVM classification, which has ABCDEF points, and then the αi Discuss geometrically in different situations
① When 0<α<C, it is known from C-α-r=0 (r is the parameter in front of the lagrangian function of the relaxation factor ξ in the svm model), r>0 and r
ξ = 0 (KKT condition, mentioned in the previous model, this is the constraint condition of the model), ξ=0
②When α=C,
r=0 from r ξ=1
③When α=0, yi( wx+b)-1+ξ >= 0, this is the constraint condition of the title, because α=0, so it is naturally established, so we
can conclude that the conditions of kkt are satisfied as follows
insert image description here
When α=0, that is, the previous ③ Naturally established
When 0<α<C, ξ=0 from ①, so yi(wx+b)>=1
When α=C, from r=0, r*ξ=0, ξ can be any >=0 value, so yi(wx+b)<=1
In summary, it is the summary in the lower left corner of the above figure (when the KKT condition is established), where ui=wx+b

Violations of KKT Conditions

So the situation that violates the KKT condition can be written out.
insert image description here
When the above situation occurs, the constraint condition can be adopted to make the solution meet the constraint condition

The following is an in-depth understanding of the two α iterations of smo

Reconsider the formula of a1y1+a2y2 = l (this formula has an equivalent replacement in the front, please note)
Since y1, y2 take ±1, so only need to consider the same sign and different sign of a1a2
Treat a1 and a2 as When drawing a graph with horizontal and vertical coordinates, it should be noted that the slope is ±1, the fluctuation is ±l,
and because 0=<α<=C, the parameter solution space is limited to a square with side length C. According to
different situations of different signs, we Make a schematic diagram
, so you only need to customize the min and max of α1 and α2 according to the boundaries

insert image description here
insert image description here
So there will be the following comprehensive model.
insert image description here
insert image description here
insert image description here
Here SVM is over. It took
more than a month to perfectly understand this chapter. Because of various courses, scientific research and trivial matters, I read and understood it intermittently. I finally completed it today and wrote down my own understanding. Share with readers who have problems and don't understand.

references

[1] "Statistical Learning Methods (Second Edition)" published by Tsinghua University Press in 2019, Li Hang

Guess you like

Origin blog.csdn.net/wlfyok/article/details/127622765