1 Regression

Regression :outpur scalar

什么是回归？output是一个数值的就是回归。

Step 1： Model (function set )

A set of function

$f_1:y = 10.0+9.0\cdot x_{cp}$

$f_2:y = 9.8+9.2\cdot x_{cp}$

$f_3:y = -0.8-1.2\cdot x_{cp}$

Linear Model

xi ：输入值x的一个属性（feature 特征值）
wi ：weight ,b :bias

$y = b + \sum w_ix_i$

Step 2 : Goodness of function

$y = b + w\cdot x_{cp}$

y hat 表示这是一个正确的数字
上标表示一个整体的资料，
下标表示这个资料里的某一个属性。

衡量function 需要 loss Function

Loss function

$L(f) = \sum_{n=1}^{10} (\hat{y}^n-f(x^n_ {cp}) )^2$

Lost function 是 function 的 function

L(f) ——> L(w,b)

$L(f) = L(w,b)= \sum_{n=1}^{10} (\hat{y}^n- (b+w\cdot x^n_ {cp}) )^2$

Step 3: Best function

pick the Best function

$f^*= {\arg \min_{f}}L(f)$

$w^*,b^*= {\arg \min_{w,b}}L(w,b)$

$= {\arg \min_{w,b}}\sum_{n=1}^{10} (\hat{y}^n- (b+w\cdot x^n_ {cp}) )^2$

Step 3: Gradient Descent

单个参数

$w^*= {\arg \min_w}L(w)$

pick an inital value w0
Compute

$\frac{dL}{dW}|_{w=w_0}$

$w_1 \leftarrow w_0-\eta\frac{dL}{dW}|_{w=w_0}$

Compute

$\frac{dL}{dW}|_{w=w_1}$

$w_2 \leftarrow w_1-\eta\frac{dL}{dW}|_{w=w_1}$

两个参数

$w^*,b^*= {\arg \min_{w,b}}L(w,b)$

pick an inital value w0
Compute (复习高等数学中如何求偏导)\

$\frac{\partial L}{\partial W}|_{w=w_0,b=b_0} , \frac{\partial L}{\partial b}|_{b=b_0,w=w_0} ,$

$w_1 \leftarrow w_0-\eta\frac{\partial L}{\partial W}|_{w=w_0,b=b_0}$

$b_1 \leftarrow b_0-\eta\frac{\partial L}{\partial b}|_{w=w_0,b=b_0}$
Compute

$\frac{\partial L}{\partial W}|_{w=w_1,b=b_1} , \frac{\partial L}{\partial b}|_{b=b_1,w=w_1} ,$

$w_2 \leftarrow w_1-\eta\frac{\partial L}{\partial W}|_{b=b_1,w=w_1}$

$b_2 \leftarrow b_1-\eta\frac{\partial L}{\partial b}|_{b=b_1,w=w_1}$

Problem

globel minima
stuck at local minima
stuck at saddle point
very slow at the plateau

Linear Regression 的 lost function 是一个凸函数，不必担心局部最小值的问题

Learning Rate

$\eta$
Learning Rate 控制步子大小、学习速度。

another linear model

$y = b + W_1 \cdot X_{cp}+W_2\cdot( X_{cp})^2$

$y = b + W_1 \cdot X_{cp}+W_2\cdot( X_{cp})^2+W_3\cdot( X_{cp})^3$

$y = b + W_1 \cdot X_{cp}+W_2\cdot( X_{cp})^2+W_3\cdot( X_{cp})^3+W_4\cdot( X_{cp})^4$

$y = b + W_1 \cdot X_{cp}+W_2\cdot( X_{cp})^2+W_3\cdot( X_{cp})^3+W_4\cdot( X_{cp})^4+W_5\cdot( X_{cp})^5$

所谓一个model是不是linear 是指他的参数对他的output 是不是linear。
- A more complex model yields lower error on training data.
If we can truly find the best function

Model Selection

model	Training	Testing
1	31.9	35.0
2	15.4	18.4
3	15.3	18.1
4	14.9	28.2
5	12.8	232.1

- A more complex model does not always lead to better performance on testing data.
- This is Overfitting

复杂模型的model space涵盖了简单模型的model space，因此在training data上的错误率更小，但并不意味着在testing data 上错误率更小。模型太复杂会出现overfitting。

What are the hidden factors?

考虑pakemon种类对cp值的影响。

Back to step 1: Redesign the Model

$if x_s=Pidgey: y = b_1+w_1\cdot x_{cp}$

$if x_s=Weedle: y = b_2+w_2\cdot x_{cp}$

$if x_s=Caterpie: y = b_3+w_3\cdot x_{cp}$

$if x_s=Eevee: y = b_4+w_4\cdot x_{cp}$

$\downarrow$

$y = b_1 \cdot \delta(x_s=Pidgey) +w_1\cdot\delta(x_s=Pidgey)\cdot x_{cp}$

$+b_2 \cdot \delta(x_s=Weedle) +w_2\cdot\delta(x_s=Weedle)\cdot x_{cp}$

$+b_3 \cdot \delta(x_s=Caterpie) +w_3\cdot\delta(x_s=Pidgey)\cdot x_{cp}$

$+b_4 \cdot \delta(x_s=Caterpie) +w_4\cdot\delta(x_s=Pidgey)\cdot x_{cp}$

Training error = 3.8 ,Testing Error= 14.3

这个模型在测试集上有更好的表现。

Are there any other hidden factors?

hp值，体重，高度对cp值的影响。

Back to step 1: Redesign the Model Again

$if x_s=Pidgey: y^, = b_1+w_1\cdot x_{cp} + w_5\cdot (x_{cp})^2$

$if x_s=Weedle: y^, = b_2+w_2\cdot x_{cp}+ w_6\cdot (x_{cp})^2$

$if x_s=Caterpie: y^,= b_3+w_3\cdot x_{cp}+ w_7\cdot (x_{cp})^2$

$if x_s=Eevee: y^, = b_4+w_4\cdot x_{cp}+ w_8\cdot (x_{cp})^2$