1. Introduction
win10, notebook ,python 3.6
Support vector machine summary
We see here a simple and intuitive introduction to the principles behind support vector machines. These methods are powerful classification methods for many reasons:
- They rely on relatively few support vectors, meaning they are very compact models and take up little memory.
- Once the model is trained, the prediction phase is very fast.
- Because they are only affected by points near edges, they are suitable for high-dimensional data, even data with dimensions larger than the sample, which is a challenge for other algorithms.
- The integration of kernel methods makes them very versatile and able to adapt to many types of data.
However, SVM also has several disadvantages:
- In the worst case, the sample number
N
complexity isO(N^3)
, and for an efficient implementation, isO(N^2)
. For large numbers of training samples, this computational cost can be prohibitive. - The results strongly depend on
C
the appropriate choice of softening parameters. This must be chosen carefully through cross-validation, and as the dataset grows, so does the overhead. - The results have no direct probabilistic interpretation. This can be estimated by internal cross-validation (see the probabilistic parameter of SVC), but this additional estimation is expensive.
Given these characteristics, I generally only consider SVMs as long as other simpler, faster, and less tuned methods are not sufficient for my needs. However, if you devote enough CPU cycles to training and validating your data using an SVM, this approach can work very well.
reference:
python data analysis manual
https://jakevdp.github.io/PythonDataScienceHandbook/05.07-support-vector-machines.html
https://www.kesci.com/home/project/5be0480f954d6e0010618cef/code
Translation of github:
https://www.jianshu.com/p/864adfd2f795
2. Simple Linear SVM
1. First generate data
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns; sns.set()
# 随机来点数据,n_samples:50个样本点,centers:中心数,random_state:随机种子,
# cluster_std:簇离散程度,
from sklearn.datasets.samples_generator import make_blobs
X, y = make_blobs(n_samples=50, centers=2,
random_state=0, cluster_std=0.60)
# 数据散点图
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
# print(X)
# print(y)
As shown in the picture:
2. Build a model
# 导入模型,使用线性核,将C参数设置为一个非常大的数值
from sklearn.svm import SVC # "Support vector classifier"
model = SVC(kernel='linear',C=1E10)
# 数据传入SVM模型
model.fit(X, y)
Auxiliary drawing functions:
def plot_svc_decision_function(model, ax=None, plot_support=True):
"""Plot the decision function for a 2D SVC"""
if ax is None:
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# create grid to evaluate model
x = np.linspace(xlim[0], xlim[1], 30)
y = np.linspace(ylim[0], ylim[1], 30)
Y, X = np.meshgrid(y, x)
xy = np.vstack([X.ravel(), Y.ravel()]).T
P = model.decision_function(xy).reshape(X.shape)
# plot decision boundary and margins
# ax.contour在这里画的是三条等高线
# 像levels,alpha这些参数,都可以调节一下,看一下有什么变化
ax.contour(X, Y, P, colors='k',
levels=[-1, 0, 1], alpha=0.5,
linestyles=['--', '-', '--'])
# plot support vectors
# 下面的操作是画出距离分界线最近的点
if plot_support:
ax.scatter(model.support_vectors_[:, 0],
model.support_vectors_[:, 1],
s=300, linewidth=1, facecolors='none');
ax.set_xlim(xlim)
ax.set_ylim(ylim)
3. Result
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
plot_svc_decision_function(model)
Support vectors:
# 支持向量
model.support_vectors_
array([[0.44359863, 3.11530945],
[2.33812285, 3.43116792],
[2.06156753, 1.96918596]])
4. Try changing the data set
Observation shows that we can build the model only with support vectors
Next, let’s try it with different numbers of data points to see if the effect changes.
Using 60 and 120 data points respectively
def plot_svm(N=10, ax=None):
X, y = make_blobs(n_samples=200, centers=2,
random_state=0, cluster_std=0.60)
X = X[:N]
y = y[:N]
model = SVC(kernel='linear', C=1E10)
model.fit(X, y)
ax = ax or plt.gca()
ax.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
ax.set_xlim(-1, 4)
ax.set_ylim(-1, 6)
plot_svc_decision_function(model, ax)
fig, ax = plt.subplots(1, 2, figsize=(16, 6))
fig.subplots_adjust(left=0.0625, right=0.95, wspace=0.1)
for axi, N in zip(ax, [60, 120]):
plot_svm(N, axi)
axi.set_title('N = {0}'.format(N))
The left side is the result of 60 points, and the right side is the result of 120 points. It
is observed that as long as the support vector does not change, it does not matter how other data is added!
5. Tips for using Notebook
Notebook, use IPython's interactive widget to interactively view this feature of an SVM model:
from ipywidgets import interact, fixed
interact(plot_svm, N=[10, 200], ax=fixed(None))
3. Kernel function SVM
1. Data
from sklearn.datasets.samples_generator import make_circles
X, y = make_circles(100, factor=.1, random_state=0,noise=.1)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
2. Three-dimensional visualization
#加入了新的维度r
from mpl_toolkits import mplot3d
r = np.exp(-(X ** 2).sum(1))
def plot_3D(elev=30, azim=30, X=X, y=y):
ax = plt.subplot(projection='3d')
ax.scatter3D(X[:, 0], X[:, 1], r, c=y, s=50, cmap='autumn')
ax.view_init(elev=elev, azim=azim)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('r')
plot_3D(elev=45, azim=45, X=X, y=y)
We can see that using this additional dimension, r = 0.7
the data can be linearly separated by drawing the separation plane at .
Here we must choose and carefully adjust our predictions:
If we didn't put the radial basis functions in the right place, we wouldn't see such clear linearly separable results.
In general, the need to make such a choice is a problem: we want to somehow automatically find the best basis functions to use.
One strategy for this is to compute basis functions centered on each point in the data set and let the SVM algorithm filter out the results. This type of basis function transformation is called a kernel transformation because it is based on the similarity relationship (or kernel) between each pair of points.
A potential problem with this strategy - N
projecting n points onto N
n dimensions - is that N
it can become very computationally expensive as it grows. However, thanks to a neat little procedure called the kernel trick ,N
fitting on kernel-transformed data can be done implicitly, that is, there is no need to construct a fully dimensional data representation for the kernel projection! This kernel trick is built into SVM and is one of the reasons why this method is so powerful.
3. Model construction, adding radial basis functions
#加入径向基函数
clf = SVC(kernel='rbf', C=1E6)
clf.fit(X, y)
4. Draw
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
plot_svc_decision_function(clf)
plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],
s=300, lw=1, facecolors='none');
Using this kernel support vector machine, we learn a suitable nonlinear decision boundary. This kernel transformation strategy is often used in machine learning!
4. Adjust SVM soft spacing
SVM implements a softening factor, which "softens" the margins: that is, it allows certain points to enter the margins if a better match is allowed.
The hardness of the edge is controlled by an adjustment parameter, usually called C
.
For very large ones C
, the margins are hard and the points cannot enter.
For smaller ones C
, the edges are softer and can be expanded and contain some points.
Adjust C parameters
- When C approaches infinity: it means that there must be no errors in the classification strictly
- When C tends to be small: it means there can be greater error tolerance
The optimal values for the parameters C
will depend on your data set and should be tuned using cross-validation or a similar process
1. Data
X, y = make_blobs(n_samples=100, centers=2,
random_state=0, cluster_std=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn');
2. C= 10, and C= 0.1
fig, ax = plt.subplots(1, 2, figsize=(16, 6))
fig.subplots_adjust(left=0.0625, right=0.95, wspace=0.1)
for axi, C in zip(ax, [10.0, 0.1]):
model = SVC(kernel='linear', C=C).fit(X, y)
axi.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
plot_svc_decision_function(model, axi)
axi.scatter(model.support_vectors_[:, 0],
model.support_vectors_[:, 1],
s=300, lw=1, facecolors='none');
axi.set_title('C = {0:.1f}'.format(C), size=14)
3、gama = 10, 与gama = 0.1
X, y = make_blobs(n_samples=100, centers=2,
random_state=0, cluster_std=1.1)
fig, ax = plt.subplots(1, 2, figsize=(16, 6))
fig.subplots_adjust(left=0.0625, right=0.95, wspace=0.1)
for axi, gamma in zip(ax, [10.0, 0.1]):
model = SVC(kernel='rbf', gamma=gamma).fit(X, y)
axi.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='autumn')
plot_svc_decision_function(model, axi)
axi.scatter(model.support_vectors_[:, 0],
model.support_vectors_[:, 1],
s=300, lw=1, facecolors='none');
axi.set_title('gamma = {0:.1f}'.format(gamma), size=14)
5. SVM implements face recognition
Use labeled faces from the Wild dataset, which contains thousands of collated photos of various public figures. Getters for datasets are built into Scikit-Learn.