SVM simple theory

SVM simple theory

Margins and Support Vectors

Let's give a simple example. Suppose we have a data set of height and weight of boys and girls. We wonder if we can train a model by learning this data set. Then this model can read a person's height and weight data and then determine the person. Is it a girl or a boy. Our idea is to construct a two-dimensional plane, with the horizontal and vertical coordinates being height and weight, so that we can see its distribution. (Orange is for girls, blue is for boys)

Insert image description here

ok, we have got such a distribution, so now we are wondering if we can draw a line to separate boys and girls, so that when a new student comes, we can calculate it based on its position with this line to determine whether it is a boy or a girl.

But now comes the problem. If you look at the picture above, you can't draw a straight line to completely separate these two categories. This is about what we need to know.

hard spacing

image

According to the picture above, we must classify the two types separately and cannot mix them with each other. This situation is a hard interval.

soft interval

Insert image description here

This situation is called soft interval. We allow some such abnormal points to appear, but these abnormal points have a loss cost, so we need to comprehensively consider a method that can balance this interval and loss, making this segmentation more reasonable.

Soft spacing is a relatively complex issue, we will talk about it specifically in a moment.

support vector

Insert image description here

The points with asterisks are support vectors, which are the two points through which the dividing boundary passes.

SVM basic model

We will only discuss the issue of hard spacing for now, and soft spacing will be discussed below.

If we want to classify, the most important point is that we want to maximize r, r is the distance from the point to the line, that is, the larger the interval, the better. This involves a question, how to maximize r, then we The expression for L can be determined, and then the maximum value of this can be calculated by derivation of r.

Insert image description here

First we give an equation of the hyperplane. Our hyperplane can be expressed by an equation

w T x + b = 0 w^Tx+b=0 wTx+b=0

Let us briefly explain this equation

x is not the abscissa, but refers to the point x, the point x(X1,X2), this w T w^TwT is also two parameters. As for why there is a transpose symbol, it is to use vector multiplication. We can use the equation of a line as an analogy, such as[ a , b ] T [ x , y ] + c = 0 [a,b ]^T[x,y]+c=0[a,b]T[x,y]+c=0 can be written asax + by + c = 0 ax+by+c=0ax+by+c=0This is the equation of the line.

In this way, a plane can be uniquely represented. As for this w T w^TwT , we write this way to be compatible with multi-dimensional situations. If there are multiple dimensions, ourw T w^TwT will have one more element than in two dimensions, because in three dimensions only two equations can determine that a surface can divide the space (actually, it is equivalent to merging multiple equations. For example, if one data has three data, we have to To distinguish, we need two equations. If there are four data, then we need three equations to determine the plane. We agree to usew T w^TwT to represent) In our example we are in a two-dimensional space, so we can write this equation asax + by + c = 0 ax+by+c=0ax+by+c=0 . (It doesn’t matter if you don’t understand this paragraph. You will understand after reading the rest of it)

First of all, we can get a distance formula from point

r = ∣ w T x + b ∣ ∣ ∣ w ∣ ∣ r=\frac{|w^Tx +b|}{||w||} r=wwTx+b

So the question is, knowing the distance equation, how can we determine this plane?

If we want to classify correctly, we need to separate different data at both ends of the line. This is actually more convenient, because we know that vectors have directions, opposite directions, and different signs. We only need the signs at both ends as much as possible Just different, that is to say, if we find that the directions of the two types of vectors to the plane are opposite, it means that the plane is reasonable

Insert image description here

This requires the use of support vectors, which are the two closest vectors to the plane. As for why it is 2, it is because both sides of the equation can be enlarged in equal proportions w T, bw^T, bwT andbare both variable, so you can choose any number, but for the convenience of calculation, we choose 1 for both sides, so the total is 2, which is the 2 of the last interval.

In this way, our previous problem is transformed into, if the classification conditions (two types of points on both sides of the line) are met, the interval should be maximized, which can be expressed mathematically as

Insert image description here

I won’t talk about the calculation method here, because it may take a long time and requires strong theoretical knowledge to understand. This blog is just an introduction and won’t be too deep. If you are interested, you can read it at National Taiwan University or The open courses at Stanford University are very detailed and accurate.

nonlinear separable problem

What if we don't have a way to find a line that separates them?

For example: (It’s funny to find trouble on purpose, right?)

Insert image description here

We can do this. We add a dimension to it and make it three-dimensional. Isn’t that the end?

Insert image description here

Insert image description here

Cowhide! Absolutely

What we think is this, assuming x 1 , x 2 x_1,x_2x1,x2The inner product is < x 1 , x 2 > <x_1,x_2><x1,x2>

There is a certain change method xxx is transformed into high-dimensionalϕ ( x ) \phi(x)ϕ ( x ) , then the inner product in high dimensions is< ϕ ( x 1 ) , ϕ ( x 2 ) > <\phi(x_1),\phi(x_2)><ϕ ( x1),ϕ ( x2)>

There is a function K, K ( x 1 , x 2 ) = < ϕ ( x 1 ) , ϕ ( x 2 ) > K (x_1,x_2)=<\phi(x_1),\phi(x_2)>Kx1,x2=<ϕ ( x1),ϕ ( x2)>

This method is called kernel technique, and K is called kernel function.

After mapping to high dimensions, we can perform corresponding processing

soft interval

This method allows us to allow some points to be wrong, but then we have to define a cost function to minimize the cost.

Insert image description here

Insert image description here

Guess you like

Origin blog.csdn.net/qq_52380836/article/details/128062163
svm