Data mining learning | task4 modeling parameter adjustment

  1. Linear Regression Model: features required for linear regression; long tail distribution process; appreciated that linear regression model;
  • Linear regression modelHere Insert Picture Description
  • By transformation of log (x + 1), so that the long tail distribution close to the normal
    Here Insert Picture Description
    Here Insert Picture Description
  1. Performance Verification Model: Evaluation of the objective functions; cross validation; leave a verification method; verification problems for the time series; learning rate plotted curve; curve plotted verification;
#绘制学习率曲线与验证曲线
from sklearn.model_selection import learning_curve, validation_curve
? learning_curve

def plot_learning_curve(estimator, title, X, y, ylim=None, cv=None,n_jobs=1, train_size=np.linspace(.1, 1.0, 5 )):  
    plt.figure()  
    plt.title(title)  
    if ylim is not None:  
        plt.ylim(*ylim)  
    plt.xlabel('Training example')  
    plt.ylabel('score')  
    train_sizes, train_scores, test_scores = learning_curve(estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_size, scoring = make_scorer(mean_absolute_error))  
    train_scores_mean = np.mean(train_scores, axis=1)  
    train_scores_std = np.std(train_scores, axis=1)  
    test_scores_mean = np.mean(test_scores, axis=1)  
    test_scores_std = np.std(test_scores, axis=1)  
    plt.grid()#区域  
    plt.fill_between(train_sizes, train_scores_mean - train_scores_std,  
                     train_scores_mean + train_scores_std, alpha=0.1,  
                     color="r")  
    plt.fill_between(train_sizes, test_scores_mean - test_scores_std,  
                     test_scores_mean + test_scores_std, alpha=0.1,  
                     color="g")  
    plt.plot(train_sizes, train_scores_mean, 'o-', color='r',  
             label="Training score")  
    plt.plot(train_sizes, test_scores_mean,'o-',color="g",  
             label="Cross-validation score")  
    plt.legend(loc="best")  
    return plt  

Here Insert Picture Description
4. Embedded feature selection: Lasso regression; Ridge regression; tree;
Here Insert Picture Description
6. Model Comparison: common linear model; common nonlinear model;
7. Model Scheduling: Greedy Scheduling method; grid parameter adjustment method; Bayesian tune reference methods

Released five original articles · won praise 1 · views 55

Guess you like

Origin blog.csdn.net/weixin_39294199/article/details/105251618