机器学习基本概念
2 选择模型
评判好模型标准: 略
尽量减少误差,误差
2.1 Bias & Variance
- bias:根据样本拟合出的模型的输出结果的期望与样本真实值的差距
- variance: 描述的是样本上训练出来的模型在测试集的表现
- 方差的定义:
- 测试样本y的期望:
将系列02中的误差拆分为bias何variance。简单model(左边)是bias比较大造成的error,这种情况叫做 Underfitting(欠拟合),而复杂model(右边)是variance过大造成的error,这种情况叫做Overfitting(过拟合)。
2.2 Model Selection
- Should NOT do: 直接在Testing Set验证Model效果。因为Testing Set有自己的bias,会导致效果变差。
Holdout Method
- 是指将数据集 D 划分成两份互斥的数据集,一份作为训练集 S,一份作为测试集 T,在 S 上训练模型,在 T 上评估模型效果;
- 尽量保证训练集 S 和测试集 T 的数据分布一致,避免由于数据划分引入额外的偏差而对最终结果产生影响.
N-fold Cross Validation
为了解决Validation Set的bias问题
3 优化方法
-
Vanilla Gradient descent
-
Adagrad
: root mean square of the previous derivatibes of parameter w -
Gradient Descent
-
Stochastic Gradient Descent
-
Gradient descent: Feature Scaling
将特征缩放到差不多大扫描二维码关注公众号,回复: 6471303 查看本文章 -
Steepest Gradient descent
SGD MGD代码
# Stochastic Gradient Descent
n_epochs = 50
t0, t1 = 5, 50 # 学习超参数
def learning_schedule(t):
return t0 / (t + t1)
theta = np.random.randn(2,1) # 随机初始化
for epoch in range(n_epochs):
for i in range(m):
random_index = np.random.randint(m)
xi = X_b[random_index:random_index+1]
yi = y[random_index:random_index+1]
gradients = 2 * xi.T.dot(xi.dot(theta) - yi)
eta = learning_schedule(epoch * m + i)
theta = theta - eta * gradients
# Stochastic Gradient Descent with Scikit-Learn
from sklearn.linear_model import SGDRegressor
sgd_reg = SGDRegressor(max_iter=1000, tol=1e-3, penalty=None, eta0=0.1)
sgd_reg.fit(X, y.ravel())
# MBGD 小批量梯度下降法
import numpy as np
import random
def gen_line_data(sample_num = 100):
"""
y = 3*x1 + 4*x2
"""
x1 = np.linspace(0,9,sample_num)
x2 = np.linspace(4,13,sample_num)
x = np.concatenate(([x1],[x2]),axis = 0).T
y = np.dot(x,np.array([3,4]).T)
return x,y
def mbgd(samples, y,step_size = 0.01,max_iter_count=10000, batch_size=0.2):
sample_num,dim = samples.shape
y = y.flatten()
w = np.ones((dim,),dtype=np.float32)
loss = 10
iter_count=0
while loss > 0.001 and iter_count < max_iter_count:
loss = 0
error = np.zeros((dim,), dtype=np.float32)
index = random.sample(range(sample_num), int(np.ceil(sample_num * batch_size)))
batch_samples = samples[index]
batch_y = y[index]
for i in range(len(batch_samples)):
predict_y = np.dot(w.T, batch_samples[i])
for j in range(dim):
error[j] += (batch_y[i] - predict_y)*batch_samples[i][j]
for j in range(dim):
w[j] += step_size * error[j]/sample_num
for i in range(sample_num):
predict_y = np.dot(w.T, samples[i])
error = (1/(sample_num * dim))*np.power((predict_y - y[i]), 2)
loss += error
iter_count += 1
return w
if __name__ == '__main__':
samples, y = gen_line_data()
w = mbgd(samples, y)
print(w)
学习回归模型评价指标
略