Data mining-modeling tuning

Common models:

Linear regression, model decision tree model, GBDT model, XGBoost model, LightGBM model

Simple linear regression, you can use sklearn

from sklearn.linear_model import LinearRegression
model = LinearRegression(normalize=True)
model = model.fit(train_X, train_y)

When our model is simply constructed, we will find that the effect of prediction or fitting is not good, then we have to look at whether there is a problem with our data. The problem of data here is more that the symbol does not meet the conditions of the model.

When we are choosing or not choosing a model, we must first consider the conditions under which this model is applicable and satisfy those basic assumptions. To give an inappropriate example, we don't care about three seven twenty one, and use a straight line to fit a curve, then it will inevitably not achieve a good result.

Cross validation (Cross Validation)

Five-fold cross validation

Use sklearn

from sklearn.model_selection import cross_val_score
#Evaluate a score by cross-validation
from sklearn.metrics import mean_absolute_error,  make_scorer
#Mean absolute error regression loss
#Make a scorer from a performance metric or loss function.

Embedded feature selection

In the filtering and wrapping feature selection methods, the feature selection process is obviously different from the training process of the learner. The embedded feature selection automatically performs feature selection during the training of the learner. The most commonly used embedded options are L1 regularization and L2 regularization. After adding two regularization methods to the linear regression model, they became ridge regression and Lasso regression respectively.

#引入子包
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
#构成模型组
models = [LinearRegression(),
          Ridge(),
          Lasso()]
#对于每个循环,并将返回的分数,返回结果result字典
result = dict()
for model in models:
    model_name = str(model).split('(')[0]
    scores = cross_val_score(model, X=train_X, y=train_y_ln, verbose=0, cv = 5, scoring=make_scorer(mean_absolute_error))
    result[model_name] = scores
    print(model_name + ' is finished')
#取出得分进行对比
result = pd.DataFrame(result)
result.index = ['cv' + str(x) for x in range(1, 6)]
result

Model tuning

  • how are you
  • Grid tuning
  • Bayesian Tuning

Guess you like

Origin blog.csdn.net/qq_45175218/article/details/105255693