感知机的模型
- 输入空间:\(X\subset R^n\)
- 输出空间:\(y=\{+1,-1\}\)
- \(f(x) = sign(w\cdot x+b)\)
\[\begin{equation} sign(x) = \left\{ \begin{aligned} +1&, x \geq 0 \\ -1 &, x <0 \end{aligned} \right. \end{equation} \]
假设空间\(\{f|f(x)=w\cdot x+b\}\)
感知机的学习策略
线性可分性
存在超平面\(w\cdot x+b=0\)使得正负样例在超平面两侧
数据集线性可分的充分必要条件是,正负实例点所构成的凸壳互不相交
\(conv(S) = \{x=\sum_{i=1}^{k}\lambda_i x_i|\sum_{i=1}^{k}\lambda_i = 1,\lambda_i \geq 0\}\)
策略
输入空间一点:\(x_0\in X\),到超平面距离
\[\frac{1}{|w|}|w\cdot x_0+b|\]
对于误分类的数据
\[-y_i(w\cdot x_i+b)\geq 0\]
误分类点集合\(M\),误分类点到超平面总距离为
\[-\frac{1}{|w|}\sum_{x_i\in M}y_i(w\cdot x_i+b)\]
不考虑\(\frac{1}{|w|}\),损失函数为
\[L(w,b) = -\sum_{x_i\in M}y_i(w\cdot x_i+b)\]
感知机的算法
求函数极小化的解
\[\underset{w.b}{min}L(w,b) = -\sum_{x_i\in M}y_i(w\cdot x_i+b) \]
损失函数的梯度
\(\nabla_w L = -\sum_{x_i\in M}x_iy_i\)
\(\nabla_bL = -\sum_{x_i\in M}y_i\)
随机梯度下降.选取误分类点\((x_i,y_i)\)
\(w\leftarrow w+ \eta y_ix_i\)
\(b\leftarrow b + \eta y_i\)
\(\eta\): learning rate
算法收敛性证明
Novikoff
(1). 存在\(|\hat{w}_{opt}|=1\),的超平面\(\hat{w}_{opt}\cdot \hat{x} =0\)将数据集完全正确可分,存在\(\gamma>0\),对任意样本点\((\hat{x}_i,y_i)\)
\(y_i(\hat{w}_{opt}\cdot x_i)\geq \gamma\)
(2)\(R=\underset{i}{max}|\hat{x}_i|\),误分类次数k满足
\[k\leq (\frac{R}{\gamma})^2\]
对偶形式
\(w\leftarrow w+ \eta y_ix_i\)
\(b\leftarrow b + \eta y_i\)
最后学习到的w.b可以表示为
\(w = \sum_{i}^{N}\alpha_iy_ix_i\)
\(b = \sum_{i=1}^{N}\alpha_iy_i\)
\(\alpha_i >0\),当\(\eta=1\)是表示第\(i\)个样本点被更新的次数
感知机模型:
\(f(x)=sign(\sum_{j=1}^{N}\alpha_iy_ix_i\cdot x+b)\)
\(\alpha = (\alpha_1,\alpha_2,\cdots,\alpha_N)^T\)
随机梯度下降,对于误分类点\((x_i,y_i)\)
\(\alpha_i \leftarrow \alpha_i+\eta\)
\(b\leftarrow b+\eta y_i\)
对偶形式中,数据样例仅仅以内积的形式出现
\(Gram\)矩阵
\(G=[x_i\cdot x_j]_{N\times N}\)