[Machine Learning] Support Vector Machine SVM

The support vector machine involves several difficult problems such as linear programming, convex optimization, and matrix analysis. This is my third time to learn it. Now it seems that there must be a fourth and fifth time in the future. Machine learning and deep learning are basically separated from support vector machines, but the ideas in them are still worth pondering and pondering over and over again. Studying this kind of thing also reminds me of the importance of mathematical ability on the academic road.

(There may be many questions in this article, I hope you can correct me.)

1. Prior knowledge of mathematics

1.1. Distance from point to plane

The discussion here is not only two-dimensional, but also high-dimensional:

First consider how a plane is represented, assuming that the unit normal vector of a plane is ω \omegaω , then any point on this plane should satisfy the inner product of the normal vector and the distance h from the origin to the hyperplane (because the vector connected by a point on the plane and the origin can be split into the direction along the normal vector and the direction perpendicular to the normal vector), So it can be expressed as:
ω x = h \omega x = hωx=h
is usually expressed as:
ω x + b = 0 \omega x + b = 0ωx+b=0
then any pointxi x_ixiThe distance to this plane is:
d = ω xi − h = ω xi + bd = \omega x_i-h = \omega x_i+bd=ωxih=ωxi+b
Of course, we assumed that the normal vector is a unit normal vector. If it is not a unit normal vector, the distance should be as follows:
d = ω xi + b ∣ ∣ ω ∣ ∣ d = \frac{\omega x_i+b}{|| \omega||}d=∣∣ω∣∣ωxi+b
Note that the above distance can be positive or negative, and the plus or minus sign indicates which side of the hyperplane it is on.

1.2. Lagrange multiplier method

The Lagrange multiplier method is the knowledge learned in Calculus 2, which is used to solve extreme value problems under constraints.

Let’s first look at the solution to extreme value problems with equality constraints, such as the following optimization problem:
min ⁡ wf ( w ) st hi ( w ) = 0 , i = 1 , … , l \begin{array}{c} \min _{w} f(w) \\ \text { st } h_{i}(w)=0, \quad i=1, \ldots, l \end{array}minwf(w) s.t. hi(w)=0,i=1,,l
The objective function is f(w) with the following equality constraints. Usually the solution is to introduce the Lagrangian operator, here use β \betaβ is used to represent the operator, and the Lagrange formula is
L ( w , β ) = f ( w ) + ∑ i = 1 l β ihi ( w ) \mathcal{L}(w, \beta)=f(w )+\sum_{i=1}^{l} \beta_{i} h_{i}(w)L(w,b )=f(w)+i=1lbihi( w )
Next, only the partial derivative is required to solve w.

Wait for the opportunity to add it here.

2. Core idea:

2.1. Thought

Find the hyperplane for classification, so that the point closest to the hyperplane is as far away from the hyperplane as possible .

This idea seems very simple, but it uses many optimization methods and techniques in actual operation, which is quite difficult.

3. Hard spacing problem:

3.1. Optimization objectives and basic deformations

Well, let's find a way to divide the data into the left and right sides of the plane, and the data value is xi x_ixi, labeled as yi y_iyi, the distance between the data and the plane is di d_idi(Again, d can be positive or negative). We think that the points on the hyperplane d>0 should satisfy y=1, and the points on d<0 should satisfy y=-1. Then d ∗ yd*ydy represents the absolute value of the distance.

Let the hyperplane be w T x + b = 0 w^Tx+b=0wTx+b=0 , then our core idea (makingthe point closest to the hyperplaneas far awayfrom the hyperplane) can be expressed as follows

max ⁡ min ⁡ i γ i = y i ∗ ( w T x i + b ) ∥ w ∥ 2 \max \min_{i} \gamma_i=\frac{y_i *\left(w^{T} x_i+b\right)}{\|w\|_{2}} maximinci=w2yi(wTxi+b)
问题可以描述为:
max ⁡ min ⁡ γ i = y i ∗ ( w x i + b ) ∥ w ∥ 2  s.t.  y i [ ( x i ⋅ w ) + b ] ≥ d m i n , i = 1 , 2 , ⋯   , l  for  ( y 1 , x 1 ) , ⋯   , ( y l , x l ) , y ∈ { − 1 , 1 } \begin{array}{l} \max \min \gamma_i=\frac{y_i *\left(w x_i+b\right)}{\|w\|_{2}}\\ \text { s.t. } \quad y_{i}\left[\left(\mathbf{x}_{i} \cdot \mathbf{w}\right)+b\right] \geq d_{min}, \quad i=1,2, \cdots, l \\ \text { for } \quad\left(y_{1}, \mathbf{x}_{1}\right), \cdots,\left(y_{l}, \mathbf{x}_{l}\right), y \in\{-1,1\} \\ \end{array} maxminci=w2yi(wxi+b) s.t. yi[(xiw)+b]dmin,i=1,2,,l for (y1,x1),,(yl,xl),y{ 1,1}

General dmin d_{min}dmindivide to one side to get w ′ w'w b ′ b' b,可以重写如下的约束:
max ⁡ min ⁡ γ i = y i ∗ ( w ′ x i + b ′ ) ∥ w ′ ∥ 2  s.t.  y i [ ( x i ⋅ w ′ ) + b ′ ] ≥ 1 , i = 1 , 2 , ⋯   , l  for  ( y 1 , x 1 ) , ⋯   , ( y l , x l ) , y ∈ { − 1 , 1 } \begin{array}{l} \max \min \gamma_i=\frac{y_i *\left(w' x_i+b'\right)}{\|w'\|_{2}}\\ \text { s.t. } \quad y_{i}\left[\left(\mathbf{x}_{i} \cdot \mathbf{w'}\right)+b'\right] \geq 1, \quad i=1,2, \cdots, l \\ \text { for } \quad\left(y_{1}, \mathbf{x}_{1}\right), \cdots,\left(y_{l}, \mathbf{x}_{l}\right), y \in\{-1,1\} \\ \end{array} maxminci=w2yi(wxi+b) s.t. yi[(xiw)+b]1,i=1,2,,l for (y1,x1),,(yl,xl),y{ 1,1}
Simple considerations can be imagined, there must be an inequality at the boundary value that becomes an equality constraint, and since min is for i, and γ \gammaThe denominator of γ is irrelevant, so the minimum value of the numerator is constrained to be 1 by the equation, and the purpose of the problem becomes to minimize the denominator:

max ⁡ 1 ∥ w ′ ∥ 2  s.t.  y i [ ( x i ⋅ w ′ ) + b ′ ] ≥ 1 , i = 1 , 2 , ⋯   , l  for  ( y 1 , x 1 ) , ⋯   , ( y l , x l ) , y ∈ { − 1 , 1 } \begin{array}{c} \max \frac{1}{\|w'\|_{2}}\\ \text { s.t. } \quad y_{i}\left[\left(\mathbf{x}_{i} \cdot \mathbf{w'}\right)+b'\right] \geq 1, \quad i=1,2, \cdots, l \\ \text { for } \quad\left(y_{1}, \mathbf{x}_{1}\right), \cdots,\left(y_{l}, \mathbf{x}_{l}\right), y \in\{-1,1\} \\ \end{array} maxw21 s.t. yi[(xiw)+b]1,i=1,2,,l for (y1,x1),,(yl,xl),y{ 1,1}

通常写成这样:
min ⁡ Φ ( w ) = 1 2 ( w ⋅ w )  w.r.t.  w  s.t.  y i [ ( x i ⋅ w ) + b ] ≥ 1 , i = 1 , 2 , ⋯   , l  for  ( y 1 , x 1 ) , ⋯   , ( y l , x l ) , y ∈ { − 1 , 1 } \begin{array}{l} \min \Phi(\boldsymbol{w})=\frac{1}{2}(\boldsymbol{w} \cdot \boldsymbol{w}) \text { w.r.t. } \boldsymbol{w} \\ \text { s.t. } \quad y_{i}\left[\left(\mathbf{x}_{i} \cdot \mathbf{w}\right)+b\right] \geq 1, \quad i=1,2, \cdots, l \\ \text { for } \quad\left(y_{1}, \mathbf{x}_{1}\right), \cdots,\left(y_{l}, \mathbf{x}_{l}\right), y \in\{-1,1\} \\ \end{array} minΦ ( w )=21(ww) w.r.t. w s.t. yi[(xiw)+b]1,i=1,2,,l for (y1,x1),,(yl,xl),y{ 1,1}

PS: About finding the objective function here, in fact, there is a more intuitive understanding idea in the MIT course (see the MIT course in the reference).

3.2. Lagrangianization of the optimization objective

Now do the equivalent transformation of the constraints: 1 − yi ( w T xi + b ) ≤ 0 1-y_{i}\left(w^{T} x_{i}+b\right) \leq 01yi(wTxi+b)0 , and then use the Lagrange daily number method to construct the Lagrangian function:
L ( w , b , λ ) = 1 2 w T w + ∑ i = 1 N λ i [ 1 − yi ( w T xi + b ) ] L(w, b, \lambda)=\frac{1}{2} w^{T} w+\sum_{i=1}^{N} \lambda_{i}\left[1-y_{i} \left(w^{T} x_{i}+b\right)\right]L(w,b,l )=21wTw+i=1Nli[1yi(wTxi+b ) ]
We want to useλ i ≥ 0 \lambda_{i} \geq 0li0 ,去除去 (w, b) 中使 y i ( w T x i + b ) < 1 y_{i}\left(w^{T} x_{i}+b\right)<1 yi(wTxi+b)<1 , the analysis is as follows:

  • 1 − y i ( w T x i + b ) > 0 1-y_{i}\left(w^{T} x_{i}+b\right)>0 1yi(wTxi+b)>0 , 则λ max ⁡ L ( w , b , λ ) = 1 2 w T w + ∞ = ∞ {}_{\lambda}^{\max } L(w, b, \lambda)=\frac{1 }{2} w^{T} w+\infty=\inftylmaxL(w,b,l )=21wTw+=
  • 1 − y i ( w T x i + b ) ≤ 0 1-y_{i}\left(w^{T} x_{i}+b\right) \leq 0 1yi(wTxi+b)0 , 则λ max ⁡ L ( w , b , λ ) = 1 2 w T w + 0 = 1 2 w T w { }_{\lambda}^{\max } L(w, b, \lambda)= \frac{1}{2} w^{T} w+0=\frac{1}{2} w^{T} wlmaxL(w,b,l )=21wTw+0=21wTw

从而, min ⁡ w , b max ⁡ λ L ( w , b , λ ) = min ⁡ w , b ( ∞ , 1 2 w T w ) = min ⁡ w , b 1 2 w T w \min _{w, b} \max _{\lambda} L(w, b, \lambda)=\min _{w, b}\left(\infty, \frac{1}{2} w^{T} w\right)=\min _{w, b} \frac{1}{2} w^{T} w minw,bmaxlL(w,b,l )=minw,b(,21wTw)=minw,b21wT w, the conditionλ i ≥ 0 \lambda_{i} \geq 0li0 .

Therefore, the constrained model will be transformed into an unconstrained model for (w, b):
{ min ⁡ w , b max ⁡ λ L ( w , b , λ ) st λ i ≥ 1 \left\{\begin{array }{cc} \min _{w, b} \max _{\lambda} & L(w, b, \lambda) \\ \text { st } & \lambda_{i} \geq 1 \end{array} \right.{ minw,bmaxl s.t L(w,b,l )li1

3.3. Lagrangian dualization

Usually it is more convenient for us to find the minimization problem, that is, the derivative is 0, so we convert the above Lagrangian function into its dual problem, that is, min ⁡ \minmin max ⁡ \max The max solution order is reversed and becomesmax ⁡ \maxmaxmin ⁡ \minmin . There is an obvious conclusion, that is, first find the minimum value for a sequence, and then find a maximum value a among all the minimum values; find the maximum value for a sequence first, and then find a minimum value b among all the maximum values . a ≤ ba \leq bab Assignment to the following parameters:
max ⁡ α , β , α i ≥ 0 min ⁡ x L ( x , α , β ) ≤ min ⁡ x max ⁡ α , β , α i ≥ 0 L ( x , α , . β ) \max _{\alpha, \beta, \alpha_{i} \geq 0} \min _{x} L(\mathbf{x}, \alpha, \beta) \leq \min _{x}\ max _{\alpha, \beta, \alpha_{i} \geq 0} L(\mathbf{x}, \alpha, \beta)a , b , ai0maxxminL(x,a ,b )xmina , b , ai0maxL(x,a ,β )
This relationship is often called weak duality, and strong duality means taking the equal sign.
Due to a series of properties of this Lagrangian function, it can be proved that the last functional relation is actually a strong dual. (Maybe only after I learn convex optimization and come back to understand slowly, cry and chat)

3.4. Dual problem optimization

First, 分λ \lambdaλ is regarded as a constant solution,L ( ω , b , λ ) L(\omega, b, \lambda)L ( ω ,b,λ ) takes the minimum value( ω ∗ , b ∗ ) \left(\omega^{*}, b^{*}\right)( oh,b ), and then forλ \lambdaλ to limit, solvethe \left(\omega^{*}, b^ {*}\right)minL ( ω ,b,λ ) _( oh,b) 过程如下:
∂ L ( ω , b , λ ) ∂ b = ∂ ∂ b { 1 2 w T ω + ∑ i = 1 N λ i [ 1 − y i ( w T x i + b ) ] } = ∂ ∂ b ( − ∑ i = 1 N λ i y i b ) = − ∑ i = 1 N λ i y i = 0 \begin{aligned} \frac{\partial L(\omega, b, \lambda)}{\partial b} & =\frac{\partial}{\partial b}\left\{\frac{1}{2} w^{T} \omega+\sum_{i=1}^{N} \lambda_{i}\left[1-y_{i}\left(w^{T} x_{i}+b\right)\right]\right\} \\ & =\frac{\partial}{\partial b}\left(-\sum_{i=1}^{N} \lambda_{i} y_{i} b\right) \\ & =-\sum_{i=1}^{N} \lambda_{i} y_{i}\\ &=0 \end{aligned} bL ( ω ,b,l ).=b{ 21wTω+i=1Nli[1yi(wTxi+b)]}=b(i=1Nliyib)=i=1Nliyi=0

∂ l ∂ b = 0 \frac{\partial l}{\partial b}=0 bl=0 ∑ i = 1 N λ i y i = 0 \sum_{i=1}^{N} \lambda_{i} y_{i}=0 i=1Nliyi=0From L ( ω , b , λ ) L(\omega ,b, \lambda)L ( ω ,b,l )

L ( ω , b , λ ) = 1 2 w T ω + ∑ i = 1 N λ i [ 1 − y i ( w T x i + b ) ] = 1 2 w T ω + ∑ i = 1 N λ i − ∑ i = 1 N λ i y i w T x i − ∑ i = 1 N λ i y i b = 1 2 w T ω + ∑ i = 1 N λ i − ∑ i = 1 N λ i y i w T x i ∂ L ( ω , b , λ ) ∂ ω = ∂ ∂ ω [ 1 2 w T ω + ∑ i = 1 N λ i − ∑ i = 1 N λ i y i w T x i ] = 1 2 ⋅ 2 ω − ∑ i = 1 N λ i y i x i \begin{aligned} L(\omega, b, \lambda) & =\frac{1}{2} w^{T} \omega+\sum_{i=1}^{N} \lambda_{i}\left[1-y_{i}\left(w^{T} x_{i}+b\right)\right] \\ & =\frac{1}{2} w^{T} \omega+\sum_{i=1}^{N} \lambda_{i}-\sum_{i=1}^{N} \lambda_{i} y_{i} w^{T} x_{i}-\sum_{i=1}^{N} \lambda_{i} y_{i} b \\ & =\frac{1}{2} w^{T} \omega+\sum_{i=1}^{N} \lambda_{i}-\sum_{i=1}^{N} \lambda_{i} y_{i} w^{T} x_{i} \\ \frac{\partial L_{(\omega, b, \lambda)}}{\partial \omega} & =\frac{\partial}{\partial \omega}\left[\frac{1}{2} w^{T} \omega+\sum_{i=1}^{N} \lambda_{i}-\sum_{i=1}^{N} \lambda_{i} y_{i} w^{T} x_{i}\right] \\ & =\frac{1}{2} \cdot 2 \omega-\sum_{i=1}^{N} \lambda_{i} y_{i} x_{i} \end{aligned} L ( ω ,b,l )ωL( ω , b , λ )=21wTω+i=1Nli[1yi(wTxi+b)]=21wTω+i=1Nlii=1NliyiwTxii=1Nliyib=21wTω+i=1Nlii=1NliyiwTxi=ω[21wTω+i=1Nlii=1NliyiwTxi]=212 oi=1Nliyixi

∂ l ∂ ω = 0 \frac{\partial l}{\partial \omega}=0ωl=0 ω= ∑ i = 1 N λ iyxi \omega=\sum_{i=1}^{N} \lambda_{i} y_{i} x_{i}oh=i=1NliyixiSubstitute into l ( ω , b , λ ) l(\omega, b, \lambda)l ( o ,b,λ ) get:
L ( ω , b , λ ) = 1 2 ( ∑ i = 1 N λ i y i x i ) T ( ∑ i = 1 N λ j y j x j ) − ∑ i = 1 N λ i y i ( ∑ i = 1 N λ j y j x j ) T x i + ∑ i = 1 N λ i = 1 2 ∑ i = 1 N ∑ j = 1 N λ i λ j y i y j x i T x j − ∑ i = 1 N ∑ 1 j = N λ i λ j y i y j x j T x i + ∑ i = N λ j = ∑ i = 1 N λ i − 1 2 ∑ i = 1 N ∑ j = 1 N λ i λ j y i y j x i T x j \begin{aligned} L(\omega, b, \lambda)&=\frac{1}{2}\left(\sum_{i=1}^{N} \lambda_{i} y_{i} x_{i}\right)^{T}\left(\sum_{i=1}^{N} \lambda_{j} y_{j} x_{j}\right)-\sum_{i=1}^{N} \lambda_{i} y_{i}\left(\sum_{i=1}^{N} \lambda_{j} y_{j} x_{j}\right)^{T} x_{i}+\sum_{i=1}^{N} \lambda_{i}\\ &=\frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \lambda_{i} \lambda_{j} y_{i} y_{j} x_{i}^{T} x_{j}-\sum_{i=1}^{N} \sum_{1 j=}^{N} \lambda_{i} \lambda_{j} y_{i} y_{j} x_{j}^{T} x_{i}+\sum_{i=}^{N} \lambda_{j} \\ &=\sum_{i=1}^{N} \lambda_{i}-\frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \lambda_{i} \lambda_{j} y_{i} y_{j} x_{i}^{T} x_{j} \end{aligned} L ( ω ,b,l )=21(i=1Nliyixi)T(i=1Nljyjxj)i=1Nliyi(i=1Nljyjxj)Txi+i=1Nli=21i=1Nj=1NliljyiyjxiTxji=1N1 j =NliljyiyjxjTxi+i=Nlj=i=1Nli21i=1Nj=1NliljyiyjxiTxj
于是对偶问题优化模型为:
{ max ⁡ ∑ i = 1 N λ i − 1 2 ∑ i = 1 N ∑ j = 1 N λ i λ j y i y j x i T x j s t . λ i ≥ 0 ,  for  ∀ i = 1 , 2 , ⋯   , N . ∑ i = 1 N λ i y i = 0 \left\{\begin{array}{c} \max \sum_{i=1}^{N} \lambda_{i}-\frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \lambda_{i} \lambda_{j} y_{i} y_{j} x_{i}^{T} x_{j} \\ s t . \lambda_{i} \geq 0, \text { for } \forall i=1,2, \cdots, N . \\ \sum_{i=1}^{N} \lambda_{i} y_{i}=0 \end{array}\right. maxi=1Nli21i=1Nj=1NliljyiyjxiTxjst.λi0, for i=1,2,,N.i=1Nliyi=0

4. Soft margin problem

If the data is linearly inseparable, then increase the relaxation factor:
ξ i ≥ 0 \xi_{i} \geq 0Xi0
makes the function interval plus the slack variable greater than or equal to 1, then the constraint condition becomes
yi ( wxi + b ) ≥ 1 − ξ i y_{i}\left(w x_{i}+b\right) \geq 1-\ xi_{i}yi(wxi+b)1Xi
The objective function, the latter term is to make this relaxation (error) as small as possible:
min ⁡ w , b 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i \min _{w, b} \frac{1 }{2}\|w\|^{2}+C \sum_{i=1}^{N} \xi_{i}w,bmin21w2+Ci=1NXi
此时的凸优化为
min ⁡ w , b , ξ 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i  s.t.  y i ( w x i + b ) ≥ 1 − ξ i , i = 1 , 2 , ⋯   , n ξ i ≥ 0 , i = 1 , 2 , ⋯   , n \begin{array}{c} \min _{w, b, \xi} \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{N} \xi_{i} \\ \text { s.t. } \quad y_{i}\left(w x_{i}+b\right) \geq 1-\xi_{i}, i=1,2, \cdots, n \\ \quad \xi_{i} \geq 0, i=1,2, \cdots, n \end{array} minw , b , ξ21w2+Ci=1NXi s.t. yi(wxi+b)1Xi,i=1,2,,nXi0,i=1,2,,n
可以写出如下的拉格朗日函数:
L ( w , b , ξ , α , u ) = 1 2 ∥ w ∥ 2 + C ∑ i = 1 n ξ i − ∑ i = 1 n α i ( y i ( w x i + b ) − 1 + ξ i ) − ∑ i = 1 n β i ξ i \begin{array}{l} L(w, b, \xi, \alpha, u)=\frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{n} \xi_{i}-\sum_{i=1}^{n} \alpha_{i}\left(y_{i}\left(w x_{i}+b\right)-1+\xi_{i}\right)-\sum_{i=1}^{n} \beta_{i} \xi_{i}\\ \end{array} L(w,b,x ,a ,u)=21w2+Ci=1nXii=1nai(yi(wxi+b)1+Xi)i=1nbiXi

Also by derivation we can get the iterative mode and constraints of parameters:
∂ L ∂ w = 0 ⇒ w = ∑ i = 1 n α iyixi ∂ L ∂ b = 0 ⇒ 0 = ∑ i = 1 n α iyi ∂ L ∂ ξ = 0 ⇒ C − α i − β i = 0 \begin{array}{l} \frac{\partial L}{\partial w}=0 \Rightarrow w=\sum_{i=1}^{n} \alpha_{i} y_{i} x_i \\ \frac{\partial L}{\partial b}=0 \Rightarrow 0=\sum_{i=1}^{n} \alpha_{i} y_{i} \\ \frac{\partial L}{\partial \xi}=0 \Rightarrow C-\alpha_{i}-\beta_{i}=0 \end{array}wL=0w=i=1naiyixibL=00=i=1naiyiξL=0Caibi=0
由KKT条件:
α i ( y i ( w x i + b ) − 1 + ξ i ) = 0 \alpha_{i}\left(y_{i}\left(w x_{i}+b\right)-1+\xi_{i}\right) = 0 ai(yi(wxi+b)1+Xi)=0

因为
y i ( ( w ⋅ x i ) + b ) ≥ 1 − ξ i y_{i}\left(\left(\boldsymbol{w} \cdot \boldsymbol{x}_{\boldsymbol{i}}\right)+b\right) \geq 1-\xi_{i} yi((wxi)+b)1Xi
Default i ≠ 0 \alpha_{i} \neqai=0 can only be established for samples that are correctly classified and located on the boundary, that is:

y i ( ( w ⋅ x i ) + b ) = 1 − ξ i 0 ≤ α i ≤ C , ξ i = 0 \begin{array}{l}y_{i}\left(\left(\boldsymbol{w} \cdot \boldsymbol{x}_{\boldsymbol{i}}\right)+b\right)=1-\xi_{i} \\ 0 \leq \alpha_{i} \leq C, \quad \xi_{i}=0\end{array} yi((wxi)+b)=1Xi0aiC,Xi=0
Misclassified samples
α i = C , ξ i > 0 \alpha_{i}=C, \quad \xi_{i}>0ai=C,Xi>0indefinitely { ∑ i = 1 n ( C − α i − β i ) = 0 α i ≥ 0 β i ≥ 0 \left\{\begin{array}{c} \sum_{i=1}^{
n
}\left(C-\alpha_{i}-\beta_{i}\right)=0 \\\alpha_{i}\geq 0 \\\beta_{i}\geq 0 \end{array}\right. i=1n(Caibi)=0ai0bi0

0 ≤ α i ≤ C 0 \leq \alpha_{i} \leq C 0aiC

w 0 = ∑ S V s α i y i x i , α i ≥ 0 \boldsymbol{w}_{0}=\sum_{S V s} \alpha_{i} y_{i} x_{i}, \quad \alpha_{i} \geq 0 w0=SVsaiyixi,ai0Specific
voltage range, cycle, range range
max ⁡ W ( α ) = ∑ i = 1 l α i − 1 2 ∑ i , j = 1 l α i α jyiyj ( xi ⋅ xj ) st ∑ i = 1 lyi α i = 0 0 ≤ α i ​​≤ C , i = 1 , 2 , ... , l \begin{aligned} \max W(\bold symbol{\alpha})= & \sum_{i=1}^ {l} \alpha_{i}-\frac{1}{2} \sum_{i, j=1}^{l} \alpha_{i} \alpha_{j} y_{i} y_{j}\left (x_{i} \cdot x_{j}\right) \\ & \text { st } \sum_{i=1}^{l} y_{i} \alpha_{i}=0 \\ 0 \leq & \alpha_{i} \leq C, i=1.2, \ldots, l \end{aligned}maxW ( a )=0i=1lai21i,j=1laiajyiyj(xixj) s.t. i=1lyiai=0aiC,i=1,2,,l

References:

course:

blog:

Guess you like

Origin blog.csdn.net/qq_56199570/article/details/129755128