markdown的源码-多元线性模型(含公式推导)-回归-监督学习-机器学习

备注:用HTML编辑器,涉及到矩阵的源码还是被渲染出来了,不知道该怎么去掉渲染效果,有人知道么?
&emsp;&emsp;假设某个体$x$有$d$个特征,即$x=(x^{1},x^{2},...,x^{d})$,$x^{i}$是第i个特征,线性模型(linear model)试图**<font color=#A52A2A >通过特征的线性组合得到预测值</font>**,即$f(x)=w^{T}x+b=w_{1}x^{1}+w_{2}x^{2}+...+w_{d}x^{d}+b$ 其中当$w_{i}$是第i个特征的权重,既能调节特征的量纲,也能显示该特征对预测值的重要程度;$w^{T}$=$(w_{1},w_{2},...,w_{d})$;$x^{T}$=$(x_{1},x_{2},...,x_{d})$;$b$代表预测值中非$x$所能影响的那部分;当$d=1$时,便是最简单的线性模型$f(x)=wx+b$;

//是否能有个好的例子

###只要能求出$w$和$b$,便能得到线性模型,该如何求得$w$和$b$呢?

&emsp;&emsp;假设训练数据集有n个个体,即$D=\left \{ (x_{1},y_{1}), (x_{2},y_{2}),..., (x_{n},y_{n}) \right \}$,$x_{i}$代表第$i$个个体,$y_{i}$代表第$i$个个体所对应的真实值。

一.$f(x)=wx+b$
<center>天下难事必作于易,天下大事必作于细 —— 老子</center>&emsp;&emsp;让我们从最简单的线性模型$f(x)=wx+b$入手,即假设每个个体只有一个特征。我们希望预测值$f(x_{i})$和真实值$y_{i}$尽可能接近,该如何衡量它们的差异呢?
&emsp;&emsp;直观来说,我们可以有两种方案:
$1) |f(x_{i})-y_{i}|$
$2)(f(x_{i})-y_{i})^{2}$
&emsp;&emsp;方案2便是高斯的<font color=#A52A2A >最小二乘法(least square method)</font>。我们把所有个体的预测值和真实值之间的差异加总:
$g(w,b)=\sum_{i=1}^{n}(f(x_{i})-y_{i})^{2}=\sum_{i=1}^{n}(wx_{i}+b-y_{i})^{2}$ &emsp;&emsp;我们的目标是求出$w$和$b$,让$g(w,b)$取得最小值。因此我们可以用偏导数求解: <font size=4 >$$\left\{\begin{matrix}
\frac{\partial g(w,b)}{\partial w}=0\\
\frac{\partial g(w,b)}{\partial b}=0
\end{matrix}\right.$$</font>
&emsp;&emsp;解出:
 <font size=4 >$$\left\{\begin{matrix}
w=\frac{\sum_{i=1}^{n}y_{i}(x_{i}-\bar{x})}{\sum_{i=1}^{n}x_{i}^2-n\bar{x}^{2}}\\
b=\bar{y}-w\bar{x}\\
\bar{x}=\frac{1}{n}\sum_{i=1}^{n}x_{i}\\
\bar{y}=\frac{1}{n}\sum_{i=1}^{n}y_{i}
\end{matrix}\right.$$</font>
二、$f(x)=w^{T}x+b$
&emsp;&emsp;接着我们由浅入深,对于多元回归模型$f(x)=w^{T}x+b$,我们仍是希望让预测值和真实值的差异最小。同样,我们仍然选取最小二乘法来衡量预测值和真实值之间的差异:
$g(w,b)=(w^{T}x_{1}+b-y_{1})^{2}+(w^{T}x_{2}+b-y_{2})^{2}+...+(w^{T}x_{n}+b-y_{n})^{2}$

 <font size=2 >$g(w,b)=[(w^{T}x_{1}+b-y_{1}), (w^{T}x_{2}+b-y_{2}), ..., (w^{T}x_{n}+b-y_{n})]\begin{bmatrix}(w^{T}x_{1}+b-y_{1})\\ (w^{T}x_{2}+b-y_{2})\\ ...\\(w^{T}x_{n}+b-y_{n})\\\end{bmatrix}$ <font>
 
 &emsp;&emsp;推导:
  <font size=2 >$\begin{bmatrix}(w^{T}x_{1}+b-y_{1})\\ (w^{T}x_{2}+b-y_{2})\\ ...\\(w^{T}x_{n}+b-y_{n})\\\end{bmatrix}=\begin{bmatrix}(w^{T}x_{1}+b)\\ (w^{T}x_{2}+b)\\ ...\\(w^{T}x_{n}+b)\\\end{bmatrix}-\begin{bmatrix}y_{1}\\ y_{2}\\ ...\\y_{n}\\\end{bmatrix}=\begin{bmatrix}(x_{1}^{T} ,1)(w,b)^{T}\\ (x_{2}^{T} ,1)(w,b)^{T}\\ ...\\(x_{n}^{T} ,1)(w,b)^{T}\\\end{bmatrix}-\begin{bmatrix}y_{1}\\ y_{2}\\ ...\\y_{n}\\\end{bmatrix}=\begin{bmatrix}x_{1}^{T} ,1\\x_{2}^{T} ,1\\ ...\\x_{n}^{T} ,1\\\end{bmatrix}(w^{T},b)-\begin{bmatrix}y_{1}\\ y_{2}\\ ...\\y_{n}\\\end{bmatrix}=X(w^{T},b)-\begin{bmatrix}y_{1}\\ y_{2}\\ ...\\y_{n}\\\end{bmatrix}=X(w^{T},b)-Y=X\tilde{w}-Y$<font>
$注:w^{T}x_{i}+b=(x_{i}^{T},1)\begin{bmatrix}w\\ b\end{bmatrix}=(x_{i}^{T} ,1)(w,b)^{T};令X=\begin{bmatrix}x_{1}^{T} ,1\\x_{2}^{T} ,1\\ ...\\x_{n}^{T} ,1\\\end{bmatrix},Y=\begin{bmatrix}y_{1}\\ y_{2}\\ ...\\y_{n}\\\end{bmatrix},\tilde{w}=(w^{T},b)$
 &emsp;&emsp;所以$g(\tilde{w})=(X\tilde{w}-Y)^{T}(X\tilde{w}-Y)$,同样我们希望$g(\tilde{w})$求得最小值,因此我们继续采用偏导数求解:
$\frac{\partial g(\tilde{w})}{\partial \tilde{w}}=0$
推导:
$g(\tilde{w})=(X\tilde{w}-Y)^{T}(X\tilde{w}-Y)=((X\tilde{w})^{T}-Y^{T})(X\tilde{w}-Y)=(X\tilde{w})^{T}X\tilde{w}-Y^{T}X\tilde{w}-(X\tilde{w})^{T}Y+Y^{T}Y=\tilde{w}^{T}X^{T}X\tilde{w}-Y^{T}X\tilde{w}-\tilde{w}^{T}X^{T}Y+Y^{T}Y$
因为$\frac{d\tilde{w}\tilde{w}^{T}}{d\tilde{w}}=2\tilde{w},\frac{d\tilde{w}^{T}}{d\tilde{w}}=I$
所以$\frac{\partial g(\tilde{w})}{\partial \tilde{w}}=2X^{T}X\tilde{w}-2X^{T}Y=0,X^{T}X\tilde{w}=X^{T}Y$
最后的结果是$\tilde{w}=(X^{T}X)^{-1}X^{T}Y$

猜你喜欢

转载自blog.csdn.net/yeziand01/article/details/80737531