SVM是一种常用的机器学习分类器模型，其原理为最大化类间隔（被称为支持向量），来达到分类的目的。它是一种有监督的模型。

SVM原理简述

SVM通过预测值 $y'=wx+b$ 与真实值 $y$ 之间的差值实现最大间隔分类。即

w x + b ⩾ 1, y = 1 w x + b ⩽ - 1, y = - 1

$wx+b\geqslant 1 , y=1\\ wx+b\leqslant -1 , y=-1$
非常简单，两个支持向量的间隔为

\frac{2}{{‖ w ‖}^{2}}

$\frac {2}{\left \| w \right \|^{2}}$ ,所以最小化

{‖ w ‖}^{2}

$\left \| w \right \|^{2}$ ,就可以使支持向量的间隔最大化。
由拉格朗日变换，可得最终的损失定义为：

l o s s = h i n g e (y^{'}, y) + {‖ w ‖}^{2} h i n g e (y^{'}, y) = m a x (1 - y (w x + b), 0)

$loss=hinge(y',y)+\left \| w \right \|^{2}\\ hinge(y',y)=max(1-y(wx+b),0)$
ref： Computing the SVM classifier

7行代码实现SVM核心

import tensorflow as tf

w = tf.Variable(tf.random_uniform([1, 2]))
b = tf.Variable(tf.random_uniform((1,)))
logit = tf.matmul(w, next_x, transpose_b=True) + b
logit = tf.tanh(logit)
prediction = tf.tanh(tf.matmul(w, next_x, transpose_b=True) + b)
loss = tf.losses.hinge_loss(next_y, logit) + tf.norm(w, 2)
opt = tf.train.RMSPropOptimizer(1e-2).minimize(loss)

传统的SVM使用SMO算法求解，但是既然我们有RMS/ADAM这样的局部优化方法，也可以直接用这种方法。
ref：Implementation

The parameters of the maximum-margin hyperplane are derived by solving the optimization. There exist several specialized algorithms for quickly solving the QP problem that arises from SVMs, mostly relying on heuristics for breaking the problem down into smaller, more-manageable chunks.
\
Another approach is to use an interior point method that uses Newton-like iterations to find a solution of the Karush–Kuhn–Tucker conditions of the primal and dual problems.[34] Instead of solving a sequence of broken down problems, this approach directly solves the problem altogether. To avoid solving a linear system involving the large kernel matrix, a low rank approximation to the matrix is often used in the kernel trick.
\
Another common method is Platt’s **sequential minimal optimization (SMO)**algorithm, which breaks the problem down into 2-dimensional sub-problems that are solved analytically, eliminating the need for a numerical optimization algorithm and matrix storage. This algorithm is conceptually simple, easy to implement, generally faster, and has better scaling properties for difficult SVM problems

测试结果

构造数据输入

我们构造一个线性的分类问题，使得 $y<x$ 为-1类, $y>x$ 为1类，注意这里写的xy只是表示二维数据中的第一位和第二位，实际上同属于输入数据（input），(-1,1)构成输入数据的真实标签（label）。数据生成器如下：

def generate_point():
    while True:
        input = (np.random.rand(2) - 0.5) * 20
        output = input[1] > input[0]
        yield input, output.astype(np.float32)

利用tensorflow.data生成batch序列:

    gen = generate_point
    data = tf.data.Dataset.from_generator(gen, (tf.float32, tf.float32))
    data = data.batch(batchsize).prefetch(2)
    data = data.make_one_shot_iterator()
    next_x, next_y = data.get_next()

完整实现代码

以下是实现线性SVM分类器的完整代码

import tensorflow as tf
import numpy as np
import os
import matplotlib.pyplot as plt


def generate_point():
    while True:
        input = (np.random.rand(2) - 0.5) * 20
        output = input[1] > input[0]
        yield input, output.astype(np.float32)


def SVM():
    batchsize = 256
    # 构建数据输入
    gen = generate_point
    data = tf.data.Dataset.from_generator(gen, (tf.float32, tf.float32))
    data = data.batch(batchsize).prefetch(2)
    data = data.make_one_shot_iterator()
    next_x, next_y = data.get_next()

    # 构建SVM
    w = tf.Variable(tf.random_uniform([1, 2]))
    b = tf.Variable(tf.random_uniform((1,)))
    logit = tf.matmul(w, next_x, transpose_b=True) + b
    logit = tf.tanh(logit)
    prediction = tf.tanh(tf.matmul(w, next_x, transpose_b=True) + b)
    loss = tf.losses.hinge_loss(next_y, logit) + tf.norm(w, 2)
    opt = tf.train.RMSPropOptimizer(1e-2).minimize(loss)

    xy = []
    for i in range(batchsize):
        xy += [next(gen())[0]]
    xy = np.stack(xy, 0)
    i = 0
    gpu_options = tf.GPUOptions(allow_growth=True)
    with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
        sess.run(tf.global_variables_initializer())
        while True:
            sess.run(opt)
            prd = sess.run(prediction, {next_x:xy})
            for p, l in zip(xy, np.squeeze(prd)):
                if l > 0:
                    plt.scatter(p[0], p[1], c='red')
                else:
                    plt.scatter(p[0], p[1], c='green')
            plt.show()
            i += 1


if __name__ == '__main__':
    os.environ['CUDA_VISIBLE_DEVICES'] = '1'
    SVM()

结果

这里写图片描述

为什么没有非线性SVM

SVM的非线性是通过核函数实现的。在实现核函数的后，还需要用到标注数据y和训练数据x才可以完成结果预测。也就是说，非线性SVM的时间复杂度不仅和模型本身有关系，还和训练数据的数量有关系，从公式看，大概是平方反比的关系。那么就会造成一种情况：理论上来说训练数据愈多，模型的泛化能力越强，但训练数据增多，反而会减慢非线性SVM的预测速度。
基于此，我认为这种方法的用处太有限，所以没有实现它。
有兴趣的朋友可以参照tensorflow_cookbook/nonliner_SVM

结论

我写这篇的目的是为了让更多刚接触tf的人看到，tensorflow不仅是一个深度学习的框架，而更多的是——正如google当初的愿景一样——做成一个机器学习通用的工具包。你可以用它实现Adaboost，random forest，Conditional random field, naive bayes等所有你想得到的经典机器学习算法。
我刚接触tensorflow时也认为这个库太大，冗余api太多。但是实际上，tf的冗余另一方面也带来了更多的可能性。所以加油吧，你越多的接触这个库，你越会喜欢上这个库！

最后，祝您身体健康，再见！

[tensorflow应用之路]10行代码实现一个完整的SVM分类器