Chapter 7: Support Vector Machine SVM: 1. sklearn.svm.SVC 1.1 code 2. Kernel function 3. Soft margin

1. Support Vector Machine SVM

Insert image description here
The main thing is to find the decision boundary.

Insert image description here
Second floor:
Insert image description here

kind:
Insert image description here

1. sklearn.svm.SVC

For example, svm separates the two categories. That is, a line on a two-dimensional plane: Insert image description heredividing the plane, the separated points y become -1 and +1 . That is the formula:Insert image description here

That is, as shown in the figure: Insert image description here
w and x here are both vectors.

1.1 Code

  • Dataset code:
from sklearn.datasets import make_blobs as mb
from sklearn.svm import SVC as svc
import matplotlib.pyplot as pt
import numpy as np
​
x,y=mb(n_samples=100,centers=2,random_state=0,cluster_std=0.5)
pt.scatter(x[:,0],x[:,1],c=y,s=100,cmap="rainbow")

Dataset image:Insert image description here

  • Code for drawing the grid:
ax = pt.gca()  # 获取当前的坐标轴对象
xlim=ax.get_xlim()
ylim=ax.get_ylim()
print(xlim)

//绘制网格
axisx=np.linspace(xlim[0],xlim[1],30)//从左到右取30个值来绘制网格
axisy=np.linspace(ylim[0],ylim[1],30)
axisy,axisx=np.meshgrid(axisy,axisx)//特征向量转化成坐标矩阵,即x,y堆叠成3030的坐标矩阵
xy=np.vstack([axisx.ravel(),axisy.ravel()]).T//将两个矩阵展平为一维数组,并在一起就是900组坐标。
print(xy.shape)//900,2

  • svc code:
cf=svc(kernel="linear").fit(x,y)
p=cf.decision_function(xy).reshape(axisx.shape)//900个点变为3030
print(p)//数值的绝对值可以表示距离的远近

ax.contour(axisx,axisy,p,colors="k",levels=[-1,0,1],alpha=0.5,linestyles=["--","-","--"])
ax.set_xlim(xlim)
ax.set_ylim(ylim)

result:Insert image description here

2. Kernel function

3. Soft intervals

Insert image description here
At this time, our decision boundary is no longer simply seeking the maximum margin, because for soft-margin data, the larger the margin, the more samples will be misclassified, so we need to find a "maximum margin" and The balance between the "number of samples that are misclassified". Parameter C is used to weigh the two goals of "correct classification of training samples" and "maximization of the margin of the decision function" that cannot be accomplished simultaneously. It is hoped to find a balance point to achieve the best effect of the model.

Code:Insert image description here
Insert image description here
Insert image description here

Guess you like

Origin blog.csdn.net/qq_53982314/article/details/131286193