吴恩达老师机器学习记录----SVM第三步:硬间隔和软间隔问题的求解

硬间隔

首先描述硬间隔问题,我们记为式(1):

$$\min_{w,b} \frac{1}{2}||w||^2 \tag{1}$$

$$st. \quad y^{(i)}(w^Tx^{(i)}+b) \ge 1$$

构造拉格朗日乘子:

$$L(w,b,\alpha) = \frac{1}{2}||w||^2 + \sum_{i=1}^m \alpha_i [1-y^{(i)}(w^Tx^{(i)} + b)] \tag{2}$$

依据拉格朗日乘子法,我们有问题的转化关系:

$$primal \quad problem \rightarrow \min_{w,b} \max_{\alpha} L(w,b,\alpha) \rightarrow \max_{\alpha} \min_{w,b}L(w,b,\alpha)$$

先求解内层的极小值部分:\(\min_{w,b} L(w,b,\alpha)\),即\(\frac{\partial}{\partial w} L(w,b,\alpha) =0\),\(\frac{\partial}{\partial b} L(w,b,\alpha) =0 \),将这两个式子解出来得:

$$\begin{align} &\frac{\partial}{\partial w} L(w,b,\alpha) = w + \sum_{i=1}^m \alpha_i y^{(i)}x^{(i)} = 0 \Longrightarrow w=\sum_{i=1}^m \alpha_iy^{(i)}x^{(i)}  \tag{3} \\ &\frac{\partial}{\partial b} L(w,b,\alpha) = \sum_{i=1}^m \alpha_iy^{(i)} = 0 \Longrightarrow \sum_{i=1}^m \alpha_i y^{(i)} = 0 \tag{4}\end{align}$$

将关于\(w, b\)的解代入拉格朗日乘子中,并进行化简:

$$ \begin{align} L(w,b,\alpha) &= \frac{1}{2} w^Tw - \sum_{i=1}^m \alpha_i y^{(i)} w^Tx^{(i)} - b\sum_{i=1}^m \alpha_i y^{(i)} + \sum_{i=1}^m \alpha_i \tag{5} \\ &= \sum_{i=1}^m \alpha_i - \frac{1}{2} \sum_{i,j=1}^{m} \alpha_i\alpha_j y^{(i)}y^{(j)} x^{(i)}x^{(j)} \end{align} \tag{6}$$

到此我们把式(1)所描述的问题转为如下形式:

$$\max_{\alpha_i} \sum_{i=1}^m \alpha_i - \frac{1}{2} \sum_{i,j=1}^{m}\alpha_i\alpha_j y^{(i)}y^{(j)} x^{(i)}x^{(j)} \tag{7}$$

$$\begin{align} st. \quad &\sum_{i=1}^m \alpha_i y^{(i)} = 0 \\ &\alpha_i \ge 0\end{align}$$

后面则是使用SMO算法解出\(\alpha\)的值,然后通过\(\alpha\)解出w和b,我们就获得了分割超平面。

软间隔

首先描述软间隔问题,我们记为式(8):

$$\min_{w,b} \frac{1}{2}||w||^2 + C\sum_{i=1}^m \xi_i \tag{8}$$

$$\begin{align} st. \quad &y^{(i)}(w^Tx^{(i)} + b) \ge 1 - \xi_i \\ &\xi_i \ge 0 \end{align}$$

软间隔与硬间隔相比,只是多了一个变量\(\xi\),约束条件中的右侧减了一个\(\xi\)意味着我们现在允许部分样本点到分割超平面的距离小于1(这个1就是我们在svm问题导出时定义的整个样本集到分割超平面的函数距离,我们之前把这个距离定为1)。同时在目标函数里加上所有的\(\xi\),这就意味着每有一个样本点距离分割超平面的距离小于1(当然有可能小于0)都会使得目标函数增大,这就保证了只有一部分样本点距离分割超平面的距离小于1,否则无法使得目标函数取得最小值。

软间隔的求解过程和硬间隔的求解过程类似,第一步是构造拉格朗日乘子:

$$ L(w,b,\xi, \alpha,\gamma) = \frac{1}{2}||w||^2 + C\sum_{i=1}^m \xi_i + \sum_{i=1}^m \alpha_i [1- \xi_i - y^{(i)}(w^Tx^{(i)} + b)] - \sum_{i=1}^m \gamma_i \xi_i\tag{9}$$

依照拉格朗日乘子法,我们有问题的转化关系:

$$primal \quad problem \rightarrow \min_{w,b,\xi} \max_{\alpha,\gamma} L(w,b,\xi,\alpha,\gamma) \rightarrow \max_{\alpha,\gamma} \min_{w,b,\xi}L(w,b,\xi,\alpha,\gamma)$$

接下来先求解内层的极小值部分:\(\min_{w,b,\xi} L(w,b,\xi, \alpha, \gamma)\),即\(\frac{\partial}{\partial w}L(w,b,\xi,\alpha,\gamma)=0\),\(\frac{\partial}{\partial b}L(w,b,\xi,\alpha,\gamma) = 0\),\(\frac{\partial}{\partial \xi} L(w,b,\xi,\alpha,\gamma) = 0\),将这三个式子解出来并记为式(10):

$$\begin{align} &\frac{\partial}{\partial w}L(w,b,\xi,\alpha,\gamma) = w - \sum_{i=1}^m \alpha_i y^{(i)} x^{(i)} = 0 \Longrightarrow w = \sum_{i=1}^m \alpha_i y^{(i)} x^{(i)} \\ &\frac{\partial}{\partial b}L(w,b,\xi,\alpha, \gamma) = \sum_{i=1}^m \alpha_i y^{(i)} = 0 \Longrightarrow \sum_{i=1}^m \alpha_i y^{(i)}=0 \tag{10}\\ &\frac{\partial}{\partial \xi}L(w,b,\xi,\alpha,\gamma) = \sum_{i=1}^mC -\sum_{i=1}^m \alpha_i - \sum_{i=1}^m \gamma_i = 0 \Longrightarrow \sum_{i=1}^m (C - \alpha_i - \gamma_i) = 0\end{align}$$

然后利用式(10)的三个解对拉格朗日乘子式(9)进行化简:

$$\begin{align} L(w,b,\xi,\alpha,\gamma) &= \frac{1}{2}w^Tw + \sum_{i=1}^m \alpha_i - \sum_{i=1}^m \alpha_i\xi_i - \sum_{i=1}^m \alpha_iy^{(i)}w^Tx^{(i)} - b\sum_{i=1}^m \alpha_iy^{(i)} + C\sum_{i=1}^m \xi_i - \sum_{i=1}^m \gamma_i \xi_i \\ &= \frac{1}{2}w^Tw + \sum_{i=1}^m \alpha_i - \sum_{i=1}^m \alpha_i\xi_i - \sum_{i=1}^m \alpha_iy^{(i)}w^Tx^{(i)} - b\sum_{i=1}^m \alpha_iy^{(i)} + \sum_{i=1}^m \alpha_i\xi_i \\ &= \frac{1}{2}w^Tw + \sum_{i=1}^m \alpha_i - \sum_{i=1}^m \alpha_iy^{(i)}w^Tx^{(i)} \\ &= \sum_{i=1}^m\alpha_i - \frac{1}{2}\sum_{i,j=1}^m \alpha_i\alpha_j y^{(i)}y^{(j)} x^{(i)}x^{(i)}\end{align}$$

至此我们将式(8)的软间隔问题转化为如下形式:

$$\max_{\alpha_i} \sum_{i=1}^m \alpha_i - \frac{1}{2}\sum_{i=1}^m \alpha_i\alpha_j y^{(i)}y^{(j)} x^{(i)}x^{(j)}$$

$$\begin{align} st. \quad &\sum_{i=1}^m \alpha_i y^{(i)} = 0 \\ &\sum_{i=1}^m (C-\alpha_i -\gamma_i) = 0 \\ &\alpha_i \ge 0 \\ &\gamma_i \ge 0 \end{align}$$

接下来还是使用SMO算法求出\(\alpha\),再由\(\alpha\)求出\(w\)、\(b\)、\(\xi\)就确定了我们的分割超平面。

可以看出软间隔和硬间隔的求解过程是非常相似的。

猜你喜欢

转载自blog.csdn.net/wang2011210219/article/details/81236448