Machine Learning -SVM

One, code implementation

#!/usr/bin/python
# -*- coding utf-8 -*-


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib as mpl
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


def load_data():
    path = 'E:\数据挖掘\Machine learning\[小象学院]机器学习课件\8.Regression代码\8.Regression\iris.data'  
    # 读取文件路径
    data = pd.read_csv(path, header = None)
    # 从data 读取数据, x为前4列的所有数据, y为第5列数据
    x, y = data[range(4)], data[4]
    # 返回字符类别的位置索引, 因y数组包含三类, 对应返回下标值
    y = pd.Categorical(y).codes
    # 取x的前两列数据, 一般SVM只做二特征分类, 多特征的转化为多个二特征分类再bagging?
    x = x[[0, 1]]
    # x = x[[0 ,2]]
    return x, y



def classifier(x,y):
    # 鸢尾花包含四个特征属性, 包含三类标签, 山鸢尾(0), 变色鸢尾(1), 维吉尼亚鸢尾(2)
    iris_feature = u'花萼长度', u'花萼宽度', u'花瓣长度', u'花瓣宽度'
    # 按 0.6 的比例,test_data 占40%, train_data 占60%, random_state随机数的种子, 1为产生相同随机数, 产生不同随机数
    x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=1, train_size=0.6)
    # 使用SVM进行分类训练, 包含关键字, C, gamma, kernel
    # kernel='linear'时,为线性核,C越大分类效果越好, kernel= 'rbf' 时(default), 为高斯核
    # gamma值越小,分类界面越连续;gamma值越大,分类界面越“散”,分类效果越好
    # decision_function_shape = 'ovr' 时,为one vs rest, 即一个类别与其他类别进行划分,decision_function_shape = 'ovo'
    # 为one vs one,即将类别两两之间进行划分,用二分类的方法模拟多分类的结果
    clf = svm.SVC(C=0.8, kernel='rbf', gamma=20, decision_function_shape='ovr')
    clf.fit(x_train, y_train.ravel())
    # score函数返回返回该次预测的系数R2, 在(0, 1)之间、accuracy_score指的是分类准确率,即分类正确占所有分类的百分比
    # recall_score 召回率 = 提取出的正确信息条数 / 样本中的信息条数
    print(clf.score(x_train, y_train))
    print('训练集准确率:', accuracy_score(y_train, clf.predict(x_train)))
    print(clf.score(x_test, y_test))
    print('测试集准确率:', accuracy_score(y_test, clf.predict(x_test)))

    # decision_function()的功能: 计算样本点到分割超平面的函数距离, 每一列的值代表距离各类别的距离
    print('decision_function:\n', clf.decision_function(x_train))
    print('\npredict:\n', clf.predict(x_train))

    # 画图
    x1_min, x2_min = x.min()  # 第0列的范围
    x1_max, x2_max = x.max()  # 第1列的范围
    x1, x2 = np.mgrid[x1_min:x1_max:500j, x2_min:x2_max:500j]  # 生成网格采样点
    grid_test = np.stack((x1.flat, x2.flat), axis=1)  # 测试点
    # print 'grid_test = \n', grid_test
    # Z = clf.decision_function(grid_test)    # 样本到决策面的距离
    # print Z
    grid_hat = clf.predict(grid_test)  # 预测分类值
    grid_hat = grid_hat.reshape(x1.shape)  # 使之与输入的形状相同
    mpl.rcParams['font.sans-serif'] = [u'SimHei']
    mpl.rcParams['axes.unicode_minus'] = False

    cm_light = mpl.colors.ListedColormap(['#A0FFA0', '#FFA0A0', '#A0A0FF'])
    cm_dark = mpl.colors.ListedColormap(['g', 'r', 'b'])
    plt.figure(facecolor='w')
    plt.pcolormesh(x1, x2, grid_hat, cmap=cm_light)
    plt.scatter(x[0], x[1], c=y, edgecolors='k', s=50, cmap=cm_dark)  # 样本
    plt.scatter(x_test[0], x_test[1], s=120, facecolors='none', zorder=10)  # 圈中测试集样本
    plt.xlabel(iris_feature[0], fontsize=13)
    plt.ylabel(iris_feature[1], fontsize=13)
    plt.xlim(x1_min, x1_max)
    plt.ylim(x2_min, x2_max)
    plt.title(u'鸢尾花SVM二特征分类', fontsize=16)
    plt.grid(b=True, ls=':')
    plt.tight_layout(pad=1.5)
    plt.show()


if __name__ == "__main__":
    x, y = load_data()
    classifier(x, y)

Second, the derivation formula

\ (From the hyperplane (w, b) in the sample space to any point x can be written as: \)

\[ r = \frac{|w^Tx + b|}{||w||} \]

\ [Derived as follows: \\ x_0 take an arbitrary point in the hyperplane x y = w ^ Tx + b projection \\ wx_0 + b = 0 \ Longrightarrow | w \ vec {xx_0} | = | w \ vec r | = || w || r \\ aspect: | w \ vec {xx_0} | = | w (x_0 -x) | = | -b-wx | = | b + wx | \\ \ therefore r = \ frac {| W ^ the Tx + B | {}} || W || \]
\ [\ = R & lt Hat YF (X) = Y (B + W ^ the Tx) \\ \ Y = Ry = R & lt tilde are \ FRAC {| w ^ Tx + b |} { || w ||} = \ frac {\ hat r} {|| w ||} \\ \\ defined \ hat r as a function of distance, \ tilde r geometric spacing \]

\[ L(\boldsymbol{w}, b, \boldsymbol{\alpha})=\frac{1}{2}\|\boldsymbol{w}\|^{2}+\sum_{i=1}^{m} \alpha_{i}\left(1-y_{i}\left(\boldsymbol{w}^{\top} \boldsymbol{x}_{i}+b\right)\right) \]

\ [Minimax problem for the original problem \ min _ {\ boldsymbol {w, b}} \ quad \ max _ {\ boldsymbol {\ alpha}} \ quad L (w, b, \ alpha) \\ maximum conversion of min problems \ max _ {\ boldsymbol {\ alpha}} \ quad \ min _ {\ boldsymbol {w, b}} \ quad L (w, b, \ alpha) \]

\ [Derived as follows: \\ objective function: min \ frac {1} {2} || w || ^ 2 \\ constraints: y_i (w ^ Tx_i + b) \ geq 1 \\ \ therefore in each y_i (w ^ Tx_i + b) i -1 is multiplied by \ alpha_i \\ \ therefore L (\ boldsymbol {w}, b, \ boldsymbol {\ alpha}) = \ frac {1} {2} \ | \ boldsymbol {w} \ | ^ {2} + \ sum_ {i = 1} ^ {m} \ alpha_ {i} \ left (1-y_ {i} \ left (\ boldsymbol {w} ^ {\ top} \ boldsymbol {x} _ {i} + b \ right) \ right) \]

\ [Studying the above equation in other machines is L (\ boldsymbol {w}, b, \ boldsymbol {\ alpha}) = \ frac {1} {2} \ | \ boldsymbol {w} \ | ^ {2} - \ sum_ {i = 1} ^ {m} \ alpha_ {i} \ left (y_ {i} \ left (\ boldsymbol {w} ^ {\ top} \ boldsymbol {x} _ {i} + b \ right) -1 \ right), both equivalent \]
\ [\ the aligned the begin {W} = & \ sum_ {I}. 1 ^ m = \ alpha_iy_i \ boldsymbol {X} \\ 0 & _i = \ sum_. 1 = {I } ^ m \ alpha_iy_i \ end { aligned} \]

\[推导如下:\\ \begin{aligned} L(\boldsymbol{w},b,\boldsymbol{\alpha}) &= \frac{1}{2}||\boldsymbol{w}||^2+\sum_{i=1}^m\alpha_i(1-y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b)) \\ & = \frac{1}{2}||\boldsymbol{w}||^2+\sum_{i=1}^m(\alpha_i-\alpha_iy_i \boldsymbol{w}^T\boldsymbol{x}_i-\alpha_iy_ib)\\ & =\frac{1}{2}\boldsymbol{w}^T\boldsymbol{w}+\sum_{i=1}^m\alpha_i -\sum_{i=1}^m\alpha_iy_i\boldsymbol{w}^T\boldsymbol{x}_i-\sum_{i=1}^m\alpha_iy_ib \end{aligned}\]

\[\frac {\partial L}{\partial \boldsymbol{w}}=\frac{1}{2}\times2\times\boldsymbol{w} + 0 - \sum_{i=1}^{m}\alpha_iy_i \boldsymbol{x}_i-0= 0 \Longrightarrow \boldsymbol{w}=\sum_{i=1}^{m}\alpha_iy_i \boldsymbol{x}_i\]

\ [\ frac {\ partial L } {\ partial b} = 0 + 0-0- \ sum_ {i = 1} ^ {m} \ alpha_iy_i = 0 \ Longrightarrow \ sum_ {i = 1} ^ {m} \ alpha_iy_i = 0 \]
\ [\ the begin {the aligned} \ max _ {\ boldsymbol {\ Alpha}} & \ sum_ {I =. 1} ^ m \ alpha_i - \ FRAC {. 1} {2} \ sum_ {I =. 1} ^ m \ sum_ {j = 1 } ^ m \ alpha_i \ alpha_j y_iy_j \ boldsymbol {x} _i ^ T \ boldsymbol {x} _j \\ st & \ sum_ {i = 1} ^ m \ alpha_i y_i = 0 \\ & \ alpha_i \ GEQ 0 \ I = 1,2 Quad, \ DOTS, m \ the aligned End {} \]
\ (derived as follows: \\ calculating Lagrangian function, i.e. a formula obtained by substituting the two \)

\[\begin{aligned} \min_{\boldsymbol{w},b} L(\boldsymbol{w},b,\boldsymbol{\alpha}) &=\frac{1}{2}\boldsymbol{w}^T\boldsymbol{w}+\sum_{i=1}^m\alpha_i -\sum_{i=1}^m\alpha_iy_i\boldsymbol{w}^T\boldsymbol{x}_i-\sum_{i=1}^m\alpha_iy_ib \\ &=\frac {1}{2}\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i-\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_ i -b\sum _{i=1}^m\alpha_iy_i \\ & = -\frac {1}{2}\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i -b\sum _{i=1}^m\alpha_iy_i \end{aligned}\]

\[\begin{aligned} \min_{\boldsymbol{w},b} L(\boldsymbol{w},b,\boldsymbol{\alpha}) &= -\frac {1}{2}\boldsymbol{w}^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i \\ &=-\frac {1}{2}(\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i)^T(\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i)+\sum _{i=1}^m\alpha_i \\ &=-\frac {1}{2}\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i \\ &=\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j \end{aligned}\]
\[ \begin{aligned} & \min_{\boldsymbol{\alpha}}\frac{1}{2}\sum_{i = 1}^m\sum_{j=1}^m\alpha_i \alpha_j y_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j- \sum_{i=1}^m\alpha_i\\ & s.t. \sum_{i=1}^m \alpha_i y_i =0 \\ & \alpha_i \geq 0 \quad i=1,2,\dots ,m \end{aligned} \]
\ [Original \ max _ {\ boldsymbol {\ alpha}} \ quad \ min _ {\ boldsymbol {w, b}} \ quad L (w, b, \ alpha) negative sign, the same reformulated as a constrained optimization problem for solving the optimal solution \ Alpha ^ * \]
\ [calculated \\ w ^ * = \ sum_ { i = 1} ^ m {\ alpha_i} ^ * y_ix_i \\ b ^ * = y_i - \ sum_ {i = 1 } ^ m {\ alpha_i} ^ * y_ix_ix_j \\ isolated hyperplane: \\ w ^ * x + b ^ * = 0 \\ classification decision function: \\ f (x) = sign (w ^ * x + b ^ *) \]

\ [Introduction relaxing factor \ xi_i objective function as follows: \\\]
\ [\ the aligned the begin {} & \ min _ {\ boldsymbol {W, B, \ XI}} \ FRAC. 1} {2} {|| W | | ^ 2 + C \ sum_ { i = 1} ^ m \ xi_i \\ st & y_i (w.x_i + b) \ geq1- \ xi_i, i = 1,2, \ dots, m \\ & \ xi_i \ geq 0 \ quad i = 1,2, \ dots, m \ end {aligned} \]

\ [Similarly the above formula, the configuration of Lagrangian function L, and then the w, b, \ xi partial derivatives are then substituted into L \]
\ [\ the aligned the begin {L} (\ boldsymbol {W}, B, \ boldsymbol {\ alpha}, \ boldsymbol {\ xi}, \ boldsymbol {\ mu}) & = \ frac {1} {2} || \ boldsymbol {w} || ^ 2 + C \ sum_ {i = 1 } ^ m \ xi_i + \ sum_ {i = 1} ^ m \ alpha_i (1- \ xi_i-y_i (\ boldsymbol {w} ^ T \ boldsymbol {x} _i + b)) - \ sum_ {i = 1} ^ m \ mu_i \ xi_i \ End {the aligned} \]
\ [for w, b, \ xi partial derivative \]
\ [\ the begin {the aligned} W & = \ sum_ {I =. 1} ^ m \ alpha_iy_i \ boldsymbol { x} _i \\ 0 & = \ sum_ {i = 1} ^ m \ alpha_iy_i \\ C & = a_i + \ mu_i \ end {aligned} \]

\ [Substituting L \]
\[ \begin{aligned} \min_{\boldsymbol{w},b,\boldsymbol{\xi}}L(\boldsymbol{w},b,\boldsymbol{\alpha},\boldsymbol{\xi},\boldsymbol{\mu}) &= \frac{1}{2}||\boldsymbol{w}||^2+C\sum_{i=1}^m \xi_i+\sum_{i=1}^m \alpha_i(1-\xi_i-y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b))-\sum_{i=1}^m\mu_i \xi_i \\ &=\frac{1}{2}||\boldsymbol{w}||^2+\sum_{i=1}^m\alpha_i(1-y_i(\boldsymbol{w}^T\boldsymbol{x}_i+b))+C\sum_{i=1}^m \xi_i-\sum_{i=1}^m \alpha_i \xi_i-\sum_{i=1}^m\mu_i \xi_i \\ & = -\frac {1}{2}\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i +\sum_{i=1}^m C\xi_i-\sum_{i=1}^m \alpha_i \xi_i-\sum_{i=1}^m\mu_i \xi_i \\ & = -\frac {1}{2}\sum_{i=1}^{m}\alpha_iy_i\boldsymbol{x}_i^T\sum _{i=1}^m\alpha_iy_i\boldsymbol{x}_i+\sum _{i=1}^m\alpha_i +\sum_{i=1}^m (C-\alpha_i-\mu_i)\xi_i \\ &=\sum _{i=1}^m\alpha_i-\frac {1}{2}\sum_{i=1 }^{m}\sum_{j=1}^{m}\alpha_i\alpha_jy_iy_j\boldsymbol{x}_i^T\boldsymbol{x}_j \end{aligned} \]
\ [Then seek \ alpha great \ max \]
\ [\ the aligned the begin {} & \ max _ {\ alpha} \ {I SUM _. 1} ^ m = \ alpha_i- \ FRAC. 1} {2} {\ sum_ { i = 1} ^ {m} \ sum_ {j = 1} ^ {m} \ alpha_i \ alpha_jy_iy_j \ boldsymbol {x} _i ^ T \ boldsymbol {x} _j \\ into \\ & \ min _ {\ alpha } \ frac {1} {2} \ sum_ {i = 1} ^ {m} \ sum_ {j = 1} ^ {m} \ alpha_i \ alpha_jy_iy_j \ boldsymbol {x} _i ^ T \ boldsymbol {x} _j- \ sum _ {i = 1} ^ m \ alpha_i \\ & s.t. \ sum_ {i = 1} ^ m \ alpha_i y_i = 0 \\ & 0 \ leq \ alpha_i \ leq C \ quad i = 1,2, \ DOTS, m \ the aligned End {} \]
\ [optimal solution \ Alpha ^ * \]
\ [calculated \\ w ^ * = \ sum_ { i = 1} ^ m {\ alpha_i} ^ * y_ix_i \\ b ^ * = (\ max_ { i: y_i = 1} w ^ * x_i + \ min_ {i: y_i = -1}. w ^ x * + x_i) / 2 \\ isolated hyperplane: \\ w ^ * x + b ^ * = 0 \\ classification decision function: \\ f (x) = sign (w ^ * x + b ^ *) \]

\[ \left\{\begin{array}{l} {\alpha_{i}\left(f\left(\boldsymbol{x}_{i}\right)-y_{i}-\epsilon-\xi_{i}\right)=0} \\ {\hat{\alpha}_{i}\left(y_{i}-f\left(\boldsymbol{x}_{i}\right)-\epsilon-\hat{\xi}_{i}\right)=0} \\ {\alpha_{i} \hat{\alpha}_{i}=0, \xi_{i} \hat{\xi}_{i}=0} \\ {\left(C-\alpha_{i}\right) \xi_{i}=0,\left(C-\hat{\alpha}_{i}\right) \hat{\xi}_{i}=0} \end{array}\right. \]
\[推导如下:\\\]
\[ \left\{\begin{array}{l}2{f\left(\boldsymbol{x}_{i}\right)-y_{i}-\epsilon-\xi_{i} \leq 0 }  \\ 3{y_{i}-f\left(\boldsymbol{x}_{i}\right)-\epsilon-\hat{\xi}_{i} \leq 0 } \\ 4{-\xi_{i} \leq 0} \\5{-\hat{\xi}_{i} \leq 0}6\end{array}\right. \]

\[ \left\{\begin{array}{l} {\alpha_i\left(f\left(\boldsymbol{x}_{i}\right)-y_{i}-\epsilon-\xi_{i} \right) = 0 } \\ {\hat{\alpha}_i\left(y_{i}-f\left(\boldsymbol{x}_{i}\right)-\epsilon-\hat{\xi}_{i} \right) = 0 } \\ {-\mu_i\xi_{i} = 0 \Rightarrow \mu_i\xi_{i} = 0 } \\ {-\hat{\mu}_i \hat{\xi}_{i} = 0 \Rightarrow \hat{\mu}_i \hat{\xi}_{i} = 0 } \end{array}\right. \]

\ [\ Because \ the aligned the begin {} \ = mu_i the C-\ alpha_i \\ \ Hat {\} _i = the C-MU \ Hat {\ Alpha} _i \ the aligned End {} \]
\ [\ left \ {\ the begin {array} {l} {\ alpha_i \ left (f \ left (\ boldsymbol {x} _ {i} \ right) -y_ {i} - \ epsilon- \ xi_ {i} \ right) = 0} \\ {\ hat {\ alpha} _i \ left (y_ {i} -f \ left (\ boldsymbol {x} _ {i} \ right) - \ epsilon- \ hat {\ xi} _ {i} \ right) = 0} \\ {(C- \ alpha_i ) \ xi_ {i} = 0} \\ {(C- \ hat {\ alpha} _i) \ hat {\ xi} _ {i} = 0} \ end {array } \ right. \]
\ [hard front and soft spacer are spaced linear processing problems, and problems of linear low-dimensional space needs to be mapped to a high-dimensional space, kernel functions \]

\ [Polynomial kernel function, Gaussian kernel, SMO algorithm, have not yet fully understand, then to add \]

Third, practice questions

3.1 given three data points, positive cases point \ (x_1 = (. 3,. 3) ^ T \) , \ (x_2 = (. 4,. 3) ^ T \) , Example negative point \ (x_3 = (1, 1 ) ^ T \) , separable SVM for linear

3.2 SVM possibility for multi-classification

Compare 3.3 SVM and Logistic Regression

What 3.4 kernel function? Mapped to the infinite-dimensional Gaussian kernel is how is it?

3.5 How to understand the loss of function of SVM?

3.6 Use Gaussian kernel, please describe the parameters of SVM and C \ (\ sigma \) impact on the classifier

3.7 Comparison of the perception of duality in the form of a linear machine with separable support vector machine dual form

3-8 demonstrate the product of a positive integer power function:

\ [K (x, z) = (x, z) ^ p \\ kernel function is positive definite, where p is a positive integer, x, z of R & lt \]

3.9 linear support vector machine may also be defined in the following form:

\[ \begin{aligned} \min_{\boldsymbol{w,b,\xi}}\quad \frac{1}{2}||w||^2+C\sum_{i=1}^N{\xi_i}^2\\ s.t.\quad y_i(\boldsymbol w.{x_i}+b)\geq 1-\xi_i, i= 1,2,...,N\\ {\xi}_i \geq 0, i=1, 2,...,N \end{aligned}\\ 求其对偶形式 \]

3.10 given data point, the positive cases point \ (x_1 = (. 1, 2) ^ T \) , \ (x_2 = (2,. 3) ^ T \) , \ (X_3 = (. 1,. 1) ^ T, \ ) negative Example point \ (X_4 = (. 1,. 1) ^ T \) , \ (x_5 = (. 1,. 1) ^ T \) , seeking the maximum distance separating hyperplane classification and decision function, and separated on FIG draw hyperplane, and support vector interval boundaries

3.11 Analysis of the causes of noise-sensitive SVM

3.12 promote the use of nuclear techniques for the number of probability regression, regression to generate matching rate

Given by the formula 3.13 (6.52) of the KKT condition

3.13 discussion linear discriminant analysis and SVM linear kernel what condition is equivalent to

Fourth, with reference to the literature

[1] "machine learning" Zou Bo

[2] "SVM Threefold" July

[3] 《pumpkin-book》 Datawhale

[4] "machine learning" Zhou Zhihua

[5] "machine learning real" Peter

[6] "statistical learning methods," Li Hang, Tsinghua University Press, 2012

Guess you like

Origin www.cnblogs.com/Jacon-hunt/p/11409720.html