[Machine Learning Core Summary] What is SVM (Support Vector Machine)

What is SVM (Support Vector Machine)

If you want to know whether the newly obtained fruit is a pear or an apple, besides drawing a circle with KNN, is there any good way?

It seems to be good to draw a line, by distinguishing the space where the two are located. When a new sample falls on the side of an apple, we consider it an apple, otherwise we consider it a pear.

Please add a picture description

This line is the SVM - Support Vector Machine .

Please add a picture description

However, there are many ways to draw this line, but which line is the most suitable?

Please add a picture description

In addition to the boundary, the distance between the sample and the line is also meaningful, which represents the confidence level of the sample classification.

Please add a picture description

Taking the apple side as an example, the apple that is farthest from the line has the highest probability of being an apple, and the closer it is, the less likely it is an apple. Our goal is to find the line that gives the highest classification confidence for all samples between the two samples.

It is not necessary to calculate all the distances, just find the samples near the line and make them as far away from the line as possible. This distance is called the classification interval, and the samples that determine the line are called support vectors, which is also the name of the support vector machine. origin.

What if the distribution of the samples crosses? Then pay attention to the distance between these samples that cannot be correctly classified by the line and the line, and find the line that can minimize this distance.

Please add a picture description

What if the distribution of samples is not ideal and cannot be distinguished by a straight line? Then through a certain transformation, map them to a space that can be distinguished by a straight line, and then find the classification line.

Please add a picture description

Before the emergence of deep learning, random forest and SVM were the best classification methods. SVM has little dependence on samples, will not overfit, and can achieve good results with small samples. Text classification, spam recognition, image classification, and even classification of proteins, SVM is widely used and has not faded so far.

Guess you like

Origin blog.csdn.net/RuanJian_GC/article/details/131544240