Save and call the python model - joblib module

Tip: After the article is written, the table of contents can be automatically generated. How to generate it can refer to the help document on the right


background

After repeated tuning with known data sets, a more accurate model is trained to predict or classify new data in the same format. Do you have to re-run the source data and code used to train the model again?

A common practice is to package the trained model into a model file, and directly call this model file for subsequent training.


1. Save the best model

dump

  • joblib.dump(value,filename,compress=0,protocol=None)
    • value: Any Python object, the object to store to disk.
    • filename: filename, str.pathlib.Path or file object. A file object or file path to store the file in. with one of the supported file extensions (".z", ".gz", "bz2", ".xz", ".lzma")
    • compress: int from 0 to 9 or bool or 2-tuple. Optional compression level for data. 0 or False no compression, higher values ​​mean more compression, but also lower read and write times. Using a value of 3 is usually a good compromise. If compress is True, a compression level of 3 is used. If compress is a 2-tuple, the first element must correspond to a string between supported compressors (e.g. 'zlib', 'gzip', 'bz2', 'lzma', 'xz'), the second elements must be integers from 0 to 9, corresponding to the compression level.
    • protocol: Don’t worry about it, it’s the same as the protocol parameter in pickle

example

  1. Import Data
import pandas as pd

# 训练集
file_pos="F:\\python_machine_learing_work\\501_model\\data\\训练集\\train_data_only_one.csv"
data_pos=pd.read_csv(file_pos,encoding='utf-8')

# 测试集
val_pos="F:\\python_machine_learing_work\\501_model\\data\\测试集\\test_data_table_only_one.csv"
data_val=pd.read_csv(val_pos,encoding='utf-8')
  1. Divide the data
# 重要变量
ipt_col=['called_rate', 'calling_called_act_hour', 'calling_called_distinct_rp', 'calling_called_distinct_cnt', 'star_level_int', 'online_days', 'calling_called_raom_cnt', 'cert_cnt', 'white_flag_0', 'age', 'calling_called_cdr_less_15_cnt', 'white_flag_1', 'calling_called_same_area_rate', 'volte_cnt', 'cdr_duration_sum', 'calling_hour_cnt', 'cdr_duration_avg', 'calling_pre7_rate', 'cdr_duration_std', 'calling_disperate', 'calling_out_area_rate', 'calling_distinct_out_op_area_cnt','payment_type_2.0', 'package_price_group_2.0', 'is_vice_card_1.0']

#拆分数据集(一个训练集一个测试集)
def train_test_spl(train_data,val_data):
    global ipt_col
    X_train=train_data[ipt_col]
    X_test=val_data[ipt_col]
    y_train=train_data[target_col]
    y_test=val_data[target_col]
    return X_train, X_test, y_train, y_test

	X_train, X_test, y_train, y_test =train_test_spl(data_pos_4,data_val_4)
  1. training model
from sklearn.model_selection import GridSearchCV
def model_train(X_train,y_train,model):
    ## 导入XGBoost模型
    from xgboost.sklearn import XGBClassifier


    if  model=='XGB':
        parameters = {
    
    'max_depth': [3,5, 10, 15, 20, 25],
          			  'learning_rate':[0.1, 0.3, 0.6],
          			  'subsample': [0.6, 0.7, 0.8, 0.85, 0.95],
              		  'colsample_bytree': [0.5, 0.6, 0.7, 0.8, 0.9]}
      
        xlf= XGBClassifier(n_estimators=50)
        grid = GridSearchCV(xlf, param_grid=parameters, scoring='accuracy', cv=3)
        grid.fit(X_train, y_train)
        best_params=grid.best_params_
        res_model=XGBClassifier(max_depth=best_params['max_depth'],learning_rate=best_params['learning_rate'],subsample=best_params['subsample'],colsample_bytree=best_params['colsample_bytree'])
        res_model.fit(X_train, y_train)

    else:
        pass
    return res_model

xgb_model= model_train(X_train, y_train, model='XGB') 
  1. save model
# 导入包
import joblib 

# 保存模型
joblib.dump(xgb_model, 'train_rf_importance_model.dat', compress=3) 

2. Load the model and use it for prediction

load

  • joblib.load(filename, mmap_mode=None)
    • filename: str.pathlib.Path or a file object. The file or file path to load the object from.
    • mmap_mode : {None, 'r+', 'r', 'w+', 'c'}, optional If not 'None', memory map the array from disk. This mode has no effect on compressed files. Note that in this case the reconstructed object may no longer exactly match the original object.
  1. load model
# 加载模型
load_model_xgb_importance = joblib.load("F:\\python_machine_learing_work\\501_model\\data\\测试集\\train_xgb_importance_model.dat")

# 使用模型预测
y_pred_rf = model_predict(load_model_xgb_importance, X_test, alpha = alpha)

Guess you like

Origin blog.csdn.net/sodaloveer/article/details/129857727