回归之线性回归推导及Python实现

线性回归:

\(h_\theta (x) = \theta_0 + \theta_1x_1+\cdots + \theta_nx_n = X\theta\)
假设函数\(h_\theta (x)\)\(m+1\)的向量。\(\theta\)\((n+1)*1\)的向量,里边有n+1个代数的模型参数,\(X\)\(m*(n+1)\)的矩阵

最大似然估计

原理: 概率大的事件在一次观测中更容易发生;在一次观测中发生了的事件其概率应该大
目标:寻找能够以较高概率产生观察数据的系统发生树

推导

\(y^{(i)} = \theta^Tx^{(i)} + \epsilon ^{(i)}\)

  • \(y^{(i)}\) :第i个标签值
  • \(x^{(i)}\) :第i个样本
  • \(\theta^Tx^{(i)}\): 在当前\(\theta\)下,第i个样本预测值
  • \(\epsilon ^{(i)}\) :在当前\(\theta\)下,预测值和实际值的误差
  1. 误差\(\epsilon ^{(i)}(1\leq i \leq n)\)独立同分布的,服从均值为 0,方差为\(\sigma^2\)高斯分布(中心极限定理)
  2. 实际问题中,很多随机现象可以看成众多因素的独立影响的综合反映,往往服从正态分布

高斯分布:\(p(x)=\frac{1}{\sigma\sqrt{ 2\pi}}e^{-\frac{(x-u)^2}{2\sigma^2}}\)

对于第i个样本

  • \(p(\epsilon^{(i)})=\frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(\epsilon^{(i)})^2}{2\sigma^2}}\)
  • \(p(y^{(i)}|x^{(i)};\theta)=\frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(y^{(i)}-\theta^Tx^{(i)})^2}{2\sigma^2}}\)

似然函数:

  • \(L(\theta) =\prod^m_{i=1}p(y^{(i)}|x^{(i)};\theta) = \prod^m_{i=1}\frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(y^{(i)}-\theta^Tx^{(i)})^2}{2\sigma^2}}\)
    取对数(取对数并不影响极值,且简化计算):
  • \(l(\theta)=logL(\theta)\)
    =\(log\prod^m_{i=1}\frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(y^{(i)}-\theta^Tx^{(i)})^2}{2\sigma^2}}\)
    = \(\sum^m_{i=1}log\frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(y^{(i)}-\theta^Tx^{(i)})^2}{2\sigma^2}}\)
    = \(\sum_{i=1}^mlog\frac{1}{\sigma \sqrt{2\pi}}-\frac{1}{\sigma^2}\cdot{\frac{1}{2}}\sum_{i=1}^m(y^{(i)}-\theta^Tx^{(i)})^2\)

欲使\(l(\theta)\)最大,使\({\frac{1}{2}}\sum_{i=1}^m(y^{(i)}-\theta^Tx^{(i)})^2\)最小即可。原因:\(\sigma\)是误差的方差,是定值。

损失函数\(loss(y,\hat{y})=J(\theta)={\frac{1}{2}}\sum_{i=1}^m(y^{(i)}-\theta^Tx^{(i)})^2\)

  • \(y^{(i)}\) :第i个标签值
  • \(x^{(i)}\) :第i个样本
  • \(\theta\) :模型要学习的变量,目的使损失函数最小

求解方法1,令导数等于0:
\(J(\theta)={\frac{1}{2}}\sum_{i=1}^m(y^{(i)}-\theta^Tx^{(i)})^2 = \frac{1}{2}(X\theta -Y)^T(X\theta -Y)\) --> \(min_\theta J(\theta)\)
\(\nabla_\theta J(\theta)=\nabla_\theta(\frac{1}{2}(X\theta -Y)^T(X\theta -Y))\)
= \(\nabla_\theta(\frac{1}{2}(\theta^TX^T -Y^T)(X\theta -Y))\)
= \(\nabla_\theta(\frac{1}{2}(\theta^TX^TX\theta -\theta^TX^TY -Y^TX\theta -Y^TY))\)
= \(\frac{1}{2}(2X^TX\theta -X^TY -(Y^TX)^T -Y^TY)\)
= \(X^TX\theta -X^TY\)
如果\(X^TX\)可逆=> \(\theta=(X^TX)^{-1}X^TY\)
实际上因为特征数量大于样本数量等原因,\(X^TX\)常常不可逆,可以增加额外的数据,导致最终的矩阵可逆=> \(\theta=(X^TX +\lambda I)^{-1}X^TY\) (岭回归的解为上述公式:\(J(\theta)={\frac{1}{2}}\sum_{i=1}^m(y^{(i)}-\theta^Tx^{(i)})^2 + \lambda \sum_{j=1}^{n}\theta^2_j\)

此处推导用到的知识点
$(XY)^T=Y^TX^T$
$\frac{d(u^Tv)}{dx} = \frac{du^t}{dx}\cdot v + \frac{dv^T}{dx}\cdot u$
$\frac{\partial AX}{\partial X} = A^T$
$\frac{\partial AX}{\partial X^T} =\frac{\partial AB^T}{\partial B} = A$
$\frac{\partial X^TA}{\partial X} =\frac{\partial A^TX}{\partial X} =(A^T)^T = A$
$\frac{\partial X^TX}{\partial X} = \frac{\partial X^T \cdot{(X)}}{\partial X} + \frac{ \partial (X^T)\cdot{} X}{\partial X} = 2X$
$\frac{\partial X^TAX}{\partial X} = \frac{\partial X^T \cdot{(AX)}}{\partial X} + \frac{ \partial (X^TA)\cdot{} X}{\partial X} = AX+A^TX$
----- 若A为对称阵,则有$\frac{\partial X^TAX}{\partial X} =2AX$
$\frac{\partial A^TXB}{\partial X} = (AB^T)^T=A^TB$
$\frac{\partial A^TX^TXA}{\partial X} = AXA + A^TX^TA^T= XA^TA + XAA^T=2XAA^T$
$\frac{\partial |A|}{\partial A} =|A|(A^{-1})^T$

求解方法2,梯度下降法,逼近全局(局部)最优解:

不负责任的先发出来 排版和后续待补

代码

最小二乘法

推导

代码

梯度下降法

推导

代码

猜你喜欢

转载自www.cnblogs.com/yunp-kon/p/11134816.html