SVM (1) Linear Separable Support Vector Machine


foreword

  Support vector machines (SVM) include building models from simple to complex: linear separable support vector machines, linear support vector machines, and nonlinear support vector machines . When the training data is linearly separable, a linear classifier is learned by hard margin maximization , that is, a linearly separable support vector machine. This article only covers linearly separable support vector machines.


1. Linear separable support vector machine

1. The data set is linearly separable

  Here we first give the concept of linear separability of datasets .
Given a data set T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( x N , y N ) } T=\{(x_1,y_1),(x_2,y_2), \cdots,(x_N,y_N)\}T={(x1,y1),(x2,y2),,(xN,yN)}
wherexi ∈ χ = R n x_i\in\chi=R^nxih=Rn y i = + 1 , − 1 , i = 1 , 2 , ⋯   , N y_i={+1,-1},i=1,2,\cdots,N yi=+1,1,i=1,2,,N , if there exists some hyperplaneSSS
w ⋅ x + b = 0 w\cdot x+b=0 wx+b=0
can completely and correctly divide the positive instance points and negative instance points of the data set to both sides of the hyperplane, then the data setTTT is a linearly separable data set; otherwise, the data setTTT is linearly inseparable.
  Given datasetTTT (shown above), assuming the training setTTT is linearly separable. The goal of learning is to find a separating hyperplane in the feature space that can classify instances into different classes. The separating hyperplane corresponds to the equationw ⋅ x + b = 0 w\cdot x+b=0wx+b=0 , which is determined by the normal vectorwww and interceptbbbDecision .
  Generally, when the training data is linearly separable, there are infinite separating hyperplanes that can separate the two types of data correctly. As shown in the figure below:
insert image description here
It can be clearly seen that there are countless planes that can correctly separate these two types of data (two examples are drawn here).
  The linearly separable support vector machine uses margin maximization to find the optimal separating hyperplane, and at this time, the solution is unique (only one plane meets the requirements).

2. Linearly Separable Support Vector Machines

The basic idea of ​​support vector machine is to solve the separation hyperplane that can strive to divide the training data set and has the largest geometric interval.
We first introduce the concept of geometric intervals .

1. Geometric interval

  For a given training dataset TTT and hyperplane( w , b ) (w,b)(w,b ) , defining a hyperplane( w , b ) (w,b)(w,(b ) the sample point(xi, yi) (x_i,y_i)(xi,yi) target interval
γ i = yi ( w ∣ ∣ w ∣ ∣ ⋅ xi + b ∣ ∣ w ∣ ∣ ) \gamma_i=y_i(\frac{w}{||w||}\cdot x_i+\frac{b }{||w||})ci=yi(∣∣w∣∣wxi+∣∣w∣∣b)
  defines a hyperplane( w , b ) (w,b)(w,b ) Regarding the training data setTTThe geometric interval of T is the hyperplane( w , b ) (w,b)(w,b ) AboutTTAll sample points in T ( xi , yi ) (x_i,y_i)(xi,yi) , that is,
γ = min ⁡ i = 1 , 2 , ⋯ , N γ i \gamma = \min_{i=1,2,\cdots,N}\gamma_ic=i=1,2,,Nminci
Hyperplane about sample points (xi, yi) (x_i,y_i)(xi,yi) is generally the signed distance from the instance point to the hyperplane, and when the sample point is correctly classified by the hyperplane, it is the distance from the instance point to the hyperplane.
Add a piece of knowledge here, point( xi , yi ) (x_i,y_i)(xi,yi) to the planeA x + B y + C = 0 Ax+By+C=0Ax+By+C=The distance of 0 is:
d = ∣ A xi + B yi + C ∣ A 2 + B 2 d=\frac{|Ax_i+By_i+C|}{\sqrt{A^2+B^2}}d=A2+B2 Axi+Byi+C
Note that the distance is positive, and this formula is extended to high-dimensional planes invariant.

2. Maximize interval

  Margin maximization here is also called hard margin maximization . The intuitive interpretation of margin maximization is: finding the hyperplane with the largest geometric margin for the training data set means classifying the training data with a sufficiently large degree of certainty. That is to say, not only the positive and negative instance points are separated, but also the most difficult instance point (the point closest to the hyperplane) is separated with sufficient confidence. Such a hyperplane should have good classification prediction ability for unknown new instances.

3. Maximum Margin Separation Hyperplane

  The maximum geometric margin separating hyperplane can be formulated as the following constrained optimization problem:
max ⁡ w , b γ s . t . yi ( w ∣ ∣ w ∣ ∣ ⋅ xi + b ∣ ∣ w ∣ ∣ ) ⩾ γ i , i = 1 , 2 , ⋯ , N \max_{w,b}\gamma \\ st \quad y_i(\frac{w}{||w||}\cdot x_i+\frac{b}{||w||} )\geqslant\gamma_i,i=1,2,\cdots,Nw,bmaxcs.t.yi(∣∣w∣∣wxi+∣∣w∣∣b)ci,i=1,2,,N
needs to be well understood here, that is, we want to maximize the hyperplane( w , b ) (w,b)(w,b ) About the geometric intervalγ \gammaγ (γ \gammaγ is the hyperplane aboutTTThe minimum value of the geometric interval of all sample points in T ), so the geometric interval of the hyperplane with respect to each sample point is at leastγ \gammac .

2. Linear separable support vector machine learning algorithm - maximum interval method

  Above we have obtained the constraint equation of the maximum geometric interval separation hyperplane, if the solution of the constrained optimization problem w ∗ , b ∗ w^*,b^*w,b , then the maximum margin separation hyperplanew ∗ ⋅ x + b ∗ = 0 w^*\cdot x+b^*=0wx+b=0 .
In order to solve the above optimization problem, here we introduce another conceptualconvex optimization problem.
  A convex optimization problem refers to an optimization problem of the following form:
min ⁡ wf ( w ) s . t . gi ( w ) ⩽ 0 , i = 1 , 2 , ⋯ , khi ( w ) = 0 , i = 1 , 2 , ⋯ , l \min_w f(w)\\ st\quad g_i(w)\leqslant0 ,i=1,2,\cdots,k \\ \quad h_i(w)=0,i=1,2,\cdots , lwminf(w)s.t.gi(w)0,i=1,2,,khi(w)=0,i=1,2,,l

1. Maximum interval method

Input: linearly separable dataset TTT
output: maximum margin classification hyperplane and decision function

  1. Construct and solve constrained optimization problem:
    min ⁡ w , b 1 2 ∣ ∣ w ∣ ∣ 2 s . t . yi ( w ⋅ xi + b ) − 1 ⩾ 0 , i = 1 , 2 , ⋯ , N \min_{ w,b}\frac{1}{2}||w||^2 \\ st\quad y_i(w\cdot x_i+b)-1\geqslant 0, i=1,2,\cdots,Nw,bmin21∣∣w2s.t.yi(wxi+b)10,i=1,2,,N
    Why is the optimization problem here not the one we listed above? The derivation is available (the specific derivation process is omitted), and the above optimization problem can be deduced to the optimization problem here.
    Find the optimal solutionw ∗ , b ∗ w^*,b^*w,b
  2. This results in a separating hyperplane:
    w ∗ ⋅ x + b ∗ = 0 w^*\cdot x+b^*=0wx+b=0
    classification decision function:
    f ( x ) = sign ( w ∗ ⋅ x + b ∗ ) f(x)=sign(w^*\cdot x+b^*)f(x)=sign(wx+b)

2. Support vectors and interval boundaries

  In the case of linear separability, the instance of the sample point closest to the separating hyperplane among the sample points of the training data set is called a support vector . The support vector is the point where the equality of the constraint condition is established, that is,
yi ( w ⋅ xi + b ) − 1 = 0 y_i(w\cdot x_i+b)-1=0yi(wxi+b)1=0
  againstyi = + 1 y_i = +1yi=+ 1 positive example point, the support vector is in the hyperplane
H 1 : w ⋅ x + b = 1 H_1:w\cdot x+b=1H1:wx+b=On 1 , yi = − 1 y_i=-1yi=The positive example point of − 1 , the support vector is in the hyperplane
H 2 : w ⋅ x + b = − 1 H_2:w\cdot x+b=-1H2:wx+b=- 1 up. As shown in the figure below, inH 1 H_1H1and H 2 H_2H2The points on are the support vectors.
insert image description here
Because H 1 H_1H1and H 2 H_2H2have the same slope, so they are both parallel, and no instance point falls between them. The separating hyperplane is parallel to them and located at their center. The width of the long strip, ie H 1 H_1H1and H 2 H_2H2The distance between is called the interval, and its size is equal to 2 ∣ ∣ w ∣ ∣ \frac{2}{||w||}∣∣w∣∣2 H 1 H_1 H1and H 2 H_2H2called interval boundaries.
Only the support vectors play a role in deciding the separating hyperplane, and other instance points do not. If the support vectors are moved, the solution obtained will be changed; if other instance points are moved outside the boundary of the interval, or even removed, the solution will not change. Since the support vector plays a decisive role in determining the separation plane, this classification model is called a support vector machine.
  We have introduced the specific steps of the linearly separable support vector machine algorithm and some related concepts (support vectors, interval boundaries), but we have not introduced how to solve the above constraint equations. In the next section, we will introduce how to use Lagrang The daily algorithm solves the optimization problem to get w ∗ , b ∗ w^*,b^*w,b


Summarize

In this section, we introduce the linearly separable support vector machine in SVM, give the specific optimization equation, and introduce the concepts of geometric interval, linearly separable data set, support vector, and interval boundary.

References

Machine learning methods. Li Hang.

Guess you like

Origin blog.csdn.net/qq_41596730/article/details/128289896