机器学习基本概念解析

2 选择模型

评判好模型标准: 略
尽量减少误差,误差

2.1 Bias & Variance

  • bias:根据样本拟合出的模型的输出结果的期望与样本真实值的差距
  • variance: 描述的是样本上训练出来的模型在测试集的表现
  • 方差的定义:
    var [ X ] = E [ ( X μ ) 2 ] = E [ X 2 2 X μ + μ 2 ] = E ( X 2 ) 2 μ 2 + μ 2 = E ( X 2 ) μ 2 E [ X 2 ] = Var [ X ] + ( E [ X ] ) 2 \operatorname{var}[X]=E\left[(X-\mu)^{2}\right]=E\left[X^{2}-2 X \mu+\mu^{2}\right]=E\left(X^{2}\right)-2 \mu^{2}+\mu^{2}=E\left(X^{2}\right)-\mu^{2}\\ E\left[X^{2}\right]=\operatorname{Var}[X]+(E[X])^{2}
  • 测试样本y的期望:
    E [ f ] = f y = f + ε E [ ε ] = 0 var [ ε ] = σ 2 E [ y ] = E [ f + ε ] = f \begin{array}{c}{E[f]=f} \\ {y=f+\varepsilon} \\ {E[\varepsilon]=0} \\ {\operatorname{var}[\varepsilon]=\sigma^{2}} \\ {E[y]=E[f+\varepsilon]=f}\end{array}

在这里插入图片描述
将系列02中的误差拆分为bias何variance。简单model(左边)是bias比较大造成的error,这种情况叫做 Underfitting(欠拟合),而复杂model(右边)是variance过大造成的error,这种情况叫做Overfitting(过拟合)。

2.2 Model Selection

在这里插入图片描述

  • Should NOT do: 直接在Testing Set验证Model效果。因为Testing Set有自己的bias,会导致效果变差。

Holdout Method

  • 是指将数据集 D 划分成两份互斥的数据集,一份作为训练集 S,一份作为测试集 T,在 S 上训练模型,在 T 上评估模型效果;
  • 尽量保证训练集 S 和测试集 T 的数据分布一致,避免由于数据划分引入额外的偏差而对最终结果产生影响.

N-fold Cross Validation

为了解决Validation Set的bias问题
在这里插入图片描述

3 优化方法

  • Vanilla Gradient descent
    w t + 1 w t η t g t η t = η t + 1 g t = C ( θ t ) w w^{t+1} \leftarrow w^{t}-\eta^{t} g^{t}\\ \eta^{t}=\frac{\eta}{\sqrt{t+1}} \quad g^{t}=\frac{\partial C\left(\theta^{t}\right)}{\partial w}

  • Adagrad
    w t + 1 w t η t σ t g t σ t = 1 t + 1 i = 0 t ( g i ) 2 } } w t + 1 w t η i = 0 t ( g i ) 2 g t \left.\begin{array}{l}{w^{t+1} \leftarrow w^{t}-\frac{\eta^{t}}{\sigma^{t}} g^{t}} \\ {\sigma^{t}=\sqrt{\frac{1}{t+1} \sum_{i=0}^{t}\left(g^{i}\right)^{2}} \}}\end{array}\right\} w^{t+1} \leftarrow w^{t}-\frac{\eta}{\sqrt{\sum_{i=0}^{t}\left(g^{i}\right)^{2}}} g^{t}
    σ t \sigma^{t} : root mean square of the previous derivatibes of parameter w

  • Gradient Descent
    θ i = θ i 1 η L ( θ i 1 ) \theta^{i}=\theta^{i-1}-\eta \nabla L\left(\theta^{i-1}\right)

  • Stochastic Gradient Descent
    L n = ( y ^ n ( b + w i x i n ) ) 2 θ i = θ i 1 η L ( θ i 1 ) L^{n}=\left(\hat{y}^{n}-\left(b+\sum w_{i} x_{i}^{n}\right)\right)^{2}\\ \theta^{i}=\theta^{i-1}-\eta \nabla L\left(\theta^{i-1}\right)

  • Gradient descent: Feature Scaling
    在这里插入图片描述
    将特征缩放到差不多大

    扫描二维码关注公众号,回复: 6471303 查看本文章
  • Steepest Gradient descent
    g ( t ) : = f ( x ( k ) + t d ( k ) )  over  t 0  Set  x ( k + 1 ) = x ( k ) + t k d ( k ) \begin{array}{c}{g(t) :=f\left(\mathbf{x}^{(k)}+t \mathbf{d}^{(k)}\right) \quad \text { over } \quad t \geq 0} \\ {\text { Set } \mathbf{x}^{(k+1)}=\mathbf{x}^{(k)}+t_{k} \mathbf{d}^{(k)}}\end{array}

SGD MGD代码

# Stochastic Gradient Descent

n_epochs = 50
t0, t1 = 5, 50 # 学习超参数

def learning_schedule(t):
    return t0 / (t + t1)

theta = np.random.randn(2,1) # 随机初始化

for epoch in range(n_epochs):
    for i in range(m):
        random_index = np.random.randint(m)
        xi = X_b[random_index:random_index+1]
        yi = y[random_index:random_index+1]
        gradients = 2 * xi.T.dot(xi.dot(theta) - yi)
        eta = learning_schedule(epoch * m + i)
        theta = theta - eta * gradients
# Stochastic Gradient Descent with Scikit-Learn

from sklearn.linear_model import SGDRegressor
sgd_reg = SGDRegressor(max_iter=1000, tol=1e-3, penalty=None, eta0=0.1)
sgd_reg.fit(X, y.ravel())
# MBGD  小批量梯度下降法
import numpy as np
import random

def gen_line_data(sample_num = 100):
    """
    y = 3*x1 + 4*x2
    """
    x1 = np.linspace(0,9,sample_num)
    x2 = np.linspace(4,13,sample_num)
    x = np.concatenate(([x1],[x2]),axis = 0).T
    y = np.dot(x,np.array([3,4]).T)
    return x,y

def mbgd(samples, y,step_size = 0.01,max_iter_count=10000, batch_size=0.2):
    sample_num,dim = samples.shape
    y = y.flatten()
    w = np.ones((dim,),dtype=np.float32)
    loss = 10
    iter_count=0
    while loss > 0.001 and iter_count < max_iter_count:
        loss = 0
        error = np.zeros((dim,), dtype=np.float32)
        
        index = random.sample(range(sample_num), int(np.ceil(sample_num * batch_size)))
        batch_samples = samples[index]
        batch_y = y[index]
        
        for i in range(len(batch_samples)):
            predict_y = np.dot(w.T, batch_samples[i])
            for j in range(dim):
                error[j] += (batch_y[i] - predict_y)*batch_samples[i][j]
        for j in range(dim):
            w[j] += step_size * error[j]/sample_num
            
        for i in range(sample_num):
            predict_y = np.dot(w.T, samples[i])
            error = (1/(sample_num * dim))*np.power((predict_y - y[i]), 2)
            loss += error
            
        iter_count += 1
    return w
    
if __name__ == '__main__':
    samples, y = gen_line_data()
    w = mbgd(samples, y)
    print(w)

学习回归模型评价指标

猜你喜欢

转载自blog.csdn.net/weixin_37409506/article/details/90297545
今日推荐