Restricted conversion of r||w||=1 derived from support vector machine

Restricted conversion of r||w||=1 derived from support vector machine

Many students must have learned support vector machines, and may have a general understanding of the algorithm of support vector machines. I think most people do not understand one step of the derivation process when learning this algorithm.
Let me briefly introduce SVM first. The core idea of ​​SVM is to find a hyperplane in a multi-dimensional space that can best separate our samples.

We all know the hyperplane, the mathematical expression is as follows:

insert image description here
So what's the best way to separate our sample?
Look at the picture below:
We can find that there are many hyperplanes, or lines, that can correctly separate our samples, but if we rely on the intuitive effect, is the red one the most effective? Ok, so how do you find this one?
insert image description here

Next, let's look at another concept, the distance formula from the sample to the hyperplane:
insert image description here
the above formula is the distance formula from the sample to the hyperplane, which will not be derived.
With this distance formula, combined with our common sense, we know that when we have a hyperplane every time, can we find the distance from each sample to the hyperplane, then the researchers believe based on prior experience that each After the hyperplane is obtained, there will always be a minimum distance from the sample to the hyperplane. The researchers found that if the minimum distance is larger, the classification effect of the hyperplane is better, so we obtained our SVM model, which is to solve a hyperplane. plane, the samples can be classified correctly, and the minimum distance between the sample and the hyperplane is the largest.
From this, we get the mathematical derivation of the support vector machine model:

Note: γ \gamma belowγ is the minimum distance from the sample to the hyperplane

Then it comes to the real core issue of this blog content, how the above mathematical expression is transformed into the following mathematical expression: Many students must have seen such an explanation: but in fact,
insert image description here
this
insert image description here
explanation, I think it may not be very good for many people Understandable, from the perspective of mathematical derivation, this push is not convincing, let me push it for everyone:

  1. Go back to the original mathematical expression
    insert image description here

2. Then, we can transform the above problem into the following problem:

insert image description here

2. Then let γ \gammaγ /||w||=1, then it will be transformed
insert image description here

Then we now discuss why let γ \gammaγ /||w||=1 does not affect the result, letγ \gammaγ /||w||=1 actually limits the value space of w, that is, the value of w we take now satisfiesγ \gammaγ /||w||=1. The value space of w is limited, and it can no longer be randomly selected, so we need to find the optimal result from w that meets the conditions. However, this limitation is not actually reflected in the optimization formula above. The blogger found that in the process of solving svm, if the Lagrange multiplier method is used to solve the above optimization problem, the equation of the above inequality constraint will be satisfied. , that is, the obtained w and b must satisfy such thatγ \gammaγ /||w||=1 case. I think it's an oolong. Just because our optimization method automatically satisfies the equation, that is, satisfiesγ \gammaγ /||w||=1, but if the result obtained by the optimization method may not satisfy the equation, I think the above optimization formula is not enough.

So I think the real optimization problem should be the following:
insert image description here
But because the problem is solved by the Lagrange multiplier method, the obtained result satisfies the constraint equation when the following constraint equation is ignored, that is, The result obtained without its constraints, then the set of w it is searching for is larger, that is, the set of w values ​​after the constraints are included. At this time, the result it solves satisfies the equation, then the result must be is the optimal solution.
insert image description here
Let me give you an example:
you are picking peaches now, there are a hundred orchards in total, and you are asked to pick the best peaches, but the best peaches must be in a specific orchard, such as orchard No. 10, if If the most delicious fruit you pick is not in the No. 10 orchard, you will fail, but you don't know in advance that the most delicious fruit must come from the No. 10 orchard.
As a result, you have tasted all the fruits from one hundred orchards now, and you find that the most delicious fruit is in orchard No. 10. Completed the task satisfactorily.

This is probably what it means. Originally, I just wanted you to find the most delicious fruit in No. 10 orchard, but you went to all the orchards, and found that the most delicious fruit is in No. 10 orchard. So the SVM model is actually an oolong in my opinion, and it may have been proved by others.

Guess you like

Origin blog.csdn.net/weixin_43327597/article/details/131603537