1. Support Vector Machine SVM
The main thing is to find the decision boundary.
Second floor:
kind:
1. sklearn.svm.SVC
For example, svm separates the two categories. That is, a line on a two-dimensional plane: dividing the plane, the separated points y become -1 and +1 . That is the formula:
That is, as shown in the figure:
w and x here are both vectors.
1.1 Code
- Dataset code:
from sklearn.datasets import make_blobs as mb
from sklearn.svm import SVC as svc
import matplotlib.pyplot as pt
import numpy as np
x,y=mb(n_samples=100,centers=2,random_state=0,cluster_std=0.5)
pt.scatter(x[:,0],x[:,1],c=y,s=100,cmap="rainbow")
Dataset image:
- Code for drawing the grid:
ax = pt.gca() # 获取当前的坐标轴对象
xlim=ax.get_xlim()
ylim=ax.get_ylim()
print(xlim)
//绘制网格
axisx=np.linspace(xlim[0],xlim[1],30)//从左到右取30个值来绘制网格
axisy=np.linspace(ylim[0],ylim[1],30)
axisy,axisx=np.meshgrid(axisy,axisx)//特征向量转化成坐标矩阵,即x,y堆叠成30,30的坐标矩阵
xy=np.vstack([axisx.ravel(),axisy.ravel()]).T//将两个矩阵展平为一维数组,并在一起就是900组坐标。
print(xy.shape)//900,2
- svc code:
cf=svc(kernel="linear").fit(x,y)
p=cf.decision_function(xy).reshape(axisx.shape)//将900个点变为30,30
print(p)//数值的绝对值可以表示距离的远近
ax.contour(axisx,axisy,p,colors="k",levels=[-1,0,1],alpha=0.5,linestyles=["--","-","--"])
ax.set_xlim(xlim)
ax.set_ylim(ylim)
result:
2. Kernel function
3. Soft intervals
At this time, our decision boundary is no longer simply seeking the maximum margin, because for soft-margin data, the larger the margin, the more samples will be misclassified, so we need to find a "maximum margin" and The balance between the "number of samples that are misclassified". Parameter C is used to weigh the two goals of "correct classification of training samples" and "maximization of the margin of the decision function" that cannot be accomplished simultaneously. It is hoped to find a balance point to achieve the best effect of the model.
Code: