SVM SVM loss function hinge hinge loss function

In the process of learning support vector machine, we know that the loss of function of the hinge loss function. As to why that name, Li Hang teacher on "statistical learning methods," says so: Since the function is shaped like a hinge, so life hinge loss function. The figure below shows the hinge loss function of the image (taken from the "statistical learning methods"):

The horizontal axis represents the function of the separation, we have to understand the function of the separation in two ways:

 

 

 

Before the significance of the loss function has not really understand. Today, after reading Andrew Ng teacher's "machine learning" related videos, and read the "statistical learning methods" related content. Have their own understanding of the hinge loss function:

The horizontal axis represents the function of the separation, we have to understand the function of the separation in two ways:

1) positive and negative

When the sample is correctly classified, y (wx + b)> 0; when the sample is misclassified, y (wx + b) <0.

2) Size

y (wx + b) an absolute value of a representative sample of how far away from the decision boundary. The larger the absolute value of y (wx + b), the sample represents a distance farther decision boundary.

Therefore, we can know:

When y (wx + b)> 0, the larger the absolute value of y (wx + b) represents the better the discrimination decision boundary of the sample

When y (wx + b) <0, the larger the absolute value of y (wx + b) represents the decision boundaries of the sample discrimination worse

We can see from the figure,

1) 0-1 loss

When the sample is correctly classified, the loss is 0; when the sample is misclassified, 1 loss.

2) loss function perceptron

When the sample is correctly classified, the loss is 0; when the sample is misclassified, loss -y (wx + b).

3) loss function Hinge

When the sample is correctly classified and the function of the separation is greater than 1, the hinge is 0 losses, losses or 1-y (wx + b).

In contrast, the hinge loss function not only to the correct classification, but high enough degree of certainty when the loss is 0. That is, the hinge loss function have higher requirements for learning.


SVM hinge loss function derivation of "statistical learning methods":

 

 

 

 

 

Reference Source:

"Statistical learning methods"

https://blog.csdn.net/lz_peter/article/details/79614556

https://blog.csdn.net/qq_26598445/article/details/80901249

In the process of learning support vector machine, we know that the loss of function of the hinge loss function. As to why that name, Li Hang teacher on "statistical learning methods," says so: Since the function is shaped like a hinge, so life hinge loss function. The figure below shows the hinge loss function of the image (taken from the "statistical learning methods"):

The horizontal axis represents the function of the separation, we have to understand the function of the separation in two ways:

 

 

 

Before the significance of the loss function has not really understand. Today, after reading Andrew Ng teacher's "machine learning" related videos, and read the "statistical learning methods" related content. Have their own understanding of the hinge loss function:

The horizontal axis represents the function of the separation, we have to understand the function of the separation in two ways:

1) positive and negative

When the sample is correctly classified, y (wx + b)> 0; when the sample is misclassified, y (wx + b) <0.

2) Size

y (wx + b) an absolute value of a representative sample of how far away from the decision boundary. The larger the absolute value of y (wx + b), the sample represents a distance farther decision boundary.

Therefore, we can know:

When y (wx + b)> 0, the larger the absolute value of y (wx + b) represents the better the discrimination decision boundary of the sample

When y (wx + b) <0, the larger the absolute value of y (wx + b) represents the decision boundaries of the sample discrimination worse

We can see from the figure,

1) 0-1 loss

When the sample is correctly classified, the loss is 0; when the sample is misclassified, 1 loss.

2) loss function perceptron

When the sample is correctly classified, the loss is 0; when the sample is misclassified, loss -y (wx + b).

3) loss function Hinge

When the sample is correctly classified and the function of the separation is greater than 1, the hinge is 0 losses, losses or 1-y (wx + b).

In contrast, the hinge loss function not only to the correct classification, but high enough degree of certainty when the loss is 0. That is, the hinge loss function have higher requirements for learning.


SVM hinge loss function derivation of "statistical learning methods":

 

 

 

 

 

Reference Source:

"Statistical learning methods"

https://blog.csdn.net/lz_peter/article/details/79614556

https://blog.csdn.net/qq_26598445/article/details/80901249

Guess you like

Origin www.cnblogs.com/WingsDy/p/11510905.html