- 线性回归模型: 线性回归对于特征的要求; 处理长尾分布; 理解线性回归模型;
- 线性回归模型建立
- 通过对log(x+1)变换,使得长尾分布贴近于正态分布
- 模型性能验证: 评价函数与目标函数; 交叉验证方法;留一验证方法; 针对时间序列问题的验证; 绘制学习率曲线; 绘制验证曲线;
#绘制学习率曲线与验证曲线
from sklearn.model_selection import learning_curve, validation_curve
? learning_curve
def plot_learning_curve(estimator, title, X, y, ylim=None, cv=None,n_jobs=1, train_size=np.linspace(.1, 1.0, 5 )):
plt.figure()
plt.title(title)
if ylim is not None:
plt.ylim(*ylim)
plt.xlabel('Training example')
plt.ylabel('score')
train_sizes, train_scores, test_scores = learning_curve(estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_size, scoring = make_scorer(mean_absolute_error))
train_scores_mean = np.mean(train_scores, axis=1)
train_scores_std = np.std(train_scores, axis=1)
test_scores_mean = np.mean(test_scores, axis=1)
test_scores_std = np.std(test_scores, axis=1)
plt.grid()#区域
plt.fill_between(train_sizes, train_scores_mean - train_scores_std,
train_scores_mean + train_scores_std, alpha=0.1,
color="r")
plt.fill_between(train_sizes, test_scores_mean - test_scores_std,
test_scores_mean + test_scores_std, alpha=0.1,
color="g")
plt.plot(train_sizes, train_scores_mean, 'o-', color='r',
label="Training score")
plt.plot(train_sizes, test_scores_mean,'o-',color="g",
label="Cross-validation score")
plt.legend(loc="best")
return plt
4. 嵌入式特征选择: Lasso回归; Ridge回归;决策树;
6. 模型对比: 常用线性模型; 常用非线性模型;
7. 模型调参: 贪心调参方法; 网格调参方法; 贝叶斯调参方法