A basic soft-margin kernel SVM implementation in Python

http://tullo.ch/articles/svm-py/

https://github.com/ajtulloch/svmpy

We consider our training set to be

\begin{equation} D = { (\mathbf{x}_{i}, y_{i}), \mathbf{x} \in \mathbb{R}^d, y \in \{ -1, 1 \} }. \end{equation}

The key idea is that we seek to find a hyperplane $w$ separating our data - and maximimize the margin of this hyperplane to optimize decision-theoretic metrics.

Let $\kappa$ be a kernel function on $\mathbb{R}^d \times \mathbb{R}^d$ - a function such that the matrix $K$ with $K_{ij} = \kappa(x_i, x_j)$ is positive semidefinite. A key property of such kernel functions is that there exists a map $\nu$ such that $\langle \nu(x), \nu(y) \rangle = \kappa(x, y)$. One can think of $\nu$ as mapping our input features into a higher dimensional output space.

We can show that for a given feature mapping $\nu$ satisfying the previous condition, the Lagrangian for the problem of finding the maximum margin hyperplane takes the form:

\begin{equation} \inf_{z \in \mathbb{R}^n} \frac{1}{2} \left| \sum_{i=1}^{n} y_i \nu(x_i) z_i \right|_2^2 - e^T z \end{equation} subject to $z \geq 0$ and $\langle z, y \rangle = 0$.

Given a resulting vector of Lagrange multipliers $z$, we find that most $z$ are zero. This comes from the complementary slackness conditions in our optimization problem - either $(x_i, y_i)$ is on the maximum margin (and so corresponding Lagrange multiplier is nonzero), or it is not on the margin (and so the Lagrange multiplier is zero).

The prediction of a given feature vector $x$ takes the form \begin{align} \label{eq:1} \langle w, \nu(x) \rangle &= \sum_{i=1}^{n} z_{i} y_{i} \langle \nu(x_{i}), \nu(x) \rangle \ &= \sum_{i=1}^{n} z_{i} y_{i} \kappa(x_{i}, x) \end{align} where we can take the sum over only the non-zero $z_{i}$.

This yields a very efficient prediction algorithm - once we have trained our SVM, a large amount of the training data (those samples with zero Lagrangian multipliers) can be removed.

支持向量机(SVM)是一系列优秀的监督学习算法,可以有效地训练分类和回归模型,并且在实践中具有非常好的性能。
支持向量机也基于凸优化和希尔伯特空间理论,在推导训练算法的各个方面有很多优美的数学,我们将在后续文章中介绍。
现在,我将简要介绍软边缘内核SVM的基本理论。 经典处理是从硬边缘线性SVM开始,然后介绍内核技巧和软边距公式,因此这比其他演示更快一些。

假设我们的训练集为:\begin{equation} D = { (\mathbf{x}_{i}, y_{i}), \mathbf{x} \in \mathbb{R}^d, y \in \{ -1, 1 \} }. \end{equation}

关键的想法是,我们寻求找到一个超平面$ w $来分离我们的数据 - 并最大化这个超平面的边缘以优化决策理论指标。

让k成为    $ \ mathbb {R} ^ d \ times \ mathbb {R} ^ d $ -a     的一个内核函数 使矩阵   $ K $   与   $ K_ {ij} = k(x_i, x_j)$   是半正定的。 这种内核函数的一个关键属性是存在一个映射  v,使得  < v(x),v(y)> = \ k(x,y)。 人们可以想到将 v映射到更高维度的输出空间。

我们可以证明,对于满足前一条件的给定特征映射v,找到最大边缘超平面问题的拉格朗日采用以下形式:

\begin{equation} \inf_{z \in \mathbb{R}^n} \frac{1}{2} \left| \sum_{i=1}^{n} y_i \nu(x_i) z_i \right|_2^2 - e^T z \end{equation} subject to $z \geq 0$ and $\langle z, y \rangle = 0$.

给定拉格朗日乘数$ z $的结果向量,我们发现大多数$ z $为零。 这来自我们的优化问题中的补充松弛条件 - $(x_i,y_i)$在最大边际上(因此相应的拉格朗日乘数非零),或者它不在边际上(因此拉格朗日乘数为零))。

给定特征向量$ x $的预测采用形式:

我们可以将总和仅取为非零$ z_ {i} $。

这产生了非常有效的预测算法 - 一旦我们训练了SVM,就可以去除大量的训练数据(那些具有零拉格朗日乘数的样本)。

还有更复杂的问题(处理偏差项,处理不可分离的数据集),但这是算法的要点。

应用

下面给出了在Python中完全实现训练(使用cvxopt作为二次程序求解器):

class SVMTrainer(object):
    def __init__(self, kernel, c):
        self._kernel = kernel
        self._c = c

    def train(self, X, y):
        """Given the training features X with labels y, returns a SVM
        predictor representing the trained SVM.
        """
        lagrange_multipliers = self._compute_multipliers(X, y)
        return self._construct_predictor(X, y, lagrange_multipliers)

    def _gram_matrix(self, X):
        n_samples, n_features = X.shape
        K = np.zeros((n_samples, n_samples))
        # TODO(tulloch) - vectorize
        for i, x_i in enumerate(X):
            for j, x_j in enumerate(X):
                K[i, j] = self._kernel(x_i, x_j)
        return K

    def _construct_predictor(self, X, y, lagrange_multipliers):
        support_vector_indices = \
            lagrange_multipliers > MIN_SUPPORT_VECTOR_MULTIPLIER

        support_multipliers = lagrange_multipliers[support_vector_indices]
        support_vectors = X[support_vector_indices]
        support_vector_labels = y[support_vector_indices]

        # http://www.cs.cmu.edu/~guestrin/Class/10701-S07/Slides/kernels.pdf
        # bias = y_k - \sum z_i y_i  K(x_k, x_i)
        # Thus we can just predict an example with bias of zero, and
        # compute error.
        bias = np.mean(
            [y_k - SVMPredictor(
                kernel=self._kernel,
                bias=0.0,
                weights=support_multipliers,
                support_vectors=support_vectors,
                support_vector_labels=support_vector_labels).predict(x_k)
             for (y_k, x_k) in zip(support_vector_labels, support_vectors)])

        return SVMPredictor(
            kernel=self._kernel,
            bias=bias,
            weights=support_multipliers,
            support_vectors=support_vectors,
            support_vector_labels=support_vector_labels)

    def _compute_multipliers(self, X, y):
        n_samples, n_features = X.shape

        K = self._gram_matrix(X)
        # Solves
        # min 1/2 x^T P x + q^T x
        # s.t.
        #  Gx \coneleq h
        #  Ax = b

        P = cvxopt.matrix(np.outer(y, y) * K)
        q = cvxopt.matrix(-1 * np.ones(n_samples))

        # -a_i \leq 0
        # TODO(tulloch) - modify G, h so that we have a soft-margin classifier
        G_std = cvxopt.matrix(np.diag(np.ones(n_samples) * -1))
        h_std = cvxopt.matrix(np.zeros(n_samples))

        # a_i \leq c
        G_slack = cvxopt.matrix(np.diag(np.ones(n_samples)))
        h_slack = cvxopt.matrix(np.ones(n_samples) * self._c)

        G = cvxopt.matrix(np.vstack((G_std, G_slack)))
        h = cvxopt.matrix(np.vstack((h_std, h_slack)))

        A = cvxopt.matrix(y, (1, n_samples))
        b = cvxopt.matrix(0.0)

        solution = cvxopt.solvers.qp(P, q, G, h, A, b)

        # Lagrange multipliers
        return np.ravel(solution['x'])

代码是相当不言自明的,并且非常接近于给定的训练算法。 为了计算我们的拉格朗日乘数,我们只需构造Gram矩阵并求解给定的QP。 然后,我们将经过训练的支持向量及其相应的拉格朗日乘数和权重传递给SVMPredictor,其实现如下。

class SVMPredictor(object):
    def __init__(self,
                 kernel,
                 bias,
                 weights,
                 support_vectors,
                 support_vector_labels):
        self._kernel = kernel
        self._bias = bias
        self._weights = weights
        self._support_vectors = support_vectors
        self._support_vector_labels = support_vector_labels

    def predict(self, x):
        """
        Computes the SVM prediction on the given features x.
        """
        result = self._bias
        for z_i, x_i, y_i in zip(self._weights,
                                 self._support_vectors,
                                 self._support_vector_labels):
            result += z_i * y_i * self._kernel(x_i, x)
        return np.sign(result).item()

这简单地实现了上述预测方程。

内核函数的示例列表在:

import numpy as np
import numpy.linalg as la


class Kernel(object):
    """Implements list of kernels from
    http://en.wikipedia.org/wiki/Support_vector_machine
    """
    @staticmethod
    def linear():
        def f(x, y):
            return np.inner(x, y)
        return f

    @staticmethod
    def gaussian(sigma):
        def f(x, y):
            exponent = -np.sqrt(la.norm(x-y) ** 2 / (2 * sigma ** 2))
            return np.exp(exponent)
        return f

    @staticmethod
    def _polykernel(dimension, offset):
        def f(x, y):
            return (offset + np.dot(x, y)) ** dimension
        return f

    @staticmethod
    def inhomogenous_polynomial(dimension):
        return Kernel._polykernel(dimension=dimension, offset=1.0)

    @staticmethod
    def homogenous_polynomial(dimension):
        return Kernel._polykernel(dimension=dimension, offset=0.0)

    @staticmethod
    def hyperbolic_tangent(kappa, c):
        def f(x, y):
            return np.tanh(kappa * np.dot(x, y) + c)
        return f

示范
我们将独立标准正常变量的绘图对展示为特征,并标记$ y_i = sign(\ sum x)$。 这是可以线性分离的,因此我们在样本数据上训练线性SVM(其中$ \ kappa(x_i,x_j)= \ langle x_i,x_j \ rangle $)。

然后,我们使用matplotlib在此数据集上可视化SVM的样本和决策边界。 有关实施的详细信息,请参阅此要点。

下面给出了该演示的示例输出:

More Information

See the svmpy library on GitHub for all code used in this post.

猜你喜欢

转载自blog.csdn.net/sinat_39372048/article/details/81234576
今日推荐