Python Automated Machine Learning tool :TPOT

TPOT是一个开源的机器学习项目,项目地址为:https://github.com/EpistasisLab/tpot

1. TPOT with code

step 1: 导入类模块
from tpot import TPOTClassifier    #分类器 from tpot import TPOTRegressor     #回归器
step 2: 实例化(default)
#
创建默认分类器 default_pipeline_optimizer_classifier = TPOTClassifier() #创建默认回归器 default_pipeline_optimizer_regressor = TPOTRegressor()
step 2: 实例化(custom)
#
创建自定义分类器 custom_pipeline_optimezer_classifier = TPOTClassifier(generations=50,population_size=50,cv=5,random_state=100, verbosity=2) #创建自定义回归器 custom_pipeline_optimezer_regressor =TPOTRegressor(generations=5,population_size=5,cv=5,random_state=20, verbosity=1)
step 3: 准备训练集、测试集
X_train, y_train, X_test, y_test = ?
#可以使用sklearn.model_selection.train_test_split()函数

step 4: 训练
custom_pipeline_optimezer_regressor.fit(X_train, y_train)

step 5: 测试
print(custom_pipeline_optimezer_regressor.score(X_test, y_test))

step 6: export the corresponding Python code for the optimized pipeline
custom_pipeline_optimezer_regressor.export('tpot_exported_pipeline.py')

 2.scoring function

方式一:pass a string to the attribute scoring
属性值可以为
'accuracy', 'adjusted_rand_score', 'average_precision', 'balanced_accuracy',

'f1','f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_median_absolute_error', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted','r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc', 'my_module.scorer_name*'
方式二:用户自定义
#
Make a custom metric function def my_scoring_func(y_true, y_pred): return mean_squared_error(y_true, y_pred) # Make a custom a scorer from the custom metric function # Note: greater_is_better=False in make_scorer below would mean that the scoring function should be minimized. my_scorer = sklearn.metrics.scorer.make_scorer(my_scoring_func,greater_is_better=False)
custom_pipeline_optimezer_regressor =TPOTRegressor(generations=5,population_size=5,cv=5,random_state=20, verbosity=1,scoring=my_scorer)

3.config_dict

有四个默认的configuration options

  1. Default TPOT
  2. TPOT light
  3. TPOT MDR
  4. TPOT sparse

具体说明:http://epistasislab.github.io/tpot/using/#built-in-tpot-configurations

custom_pipeline_optimezer_regressor  =TPOTRegressor(generations=5,population_size=5,cv=5,random_state=20,
                                                      verbosity=1,config_dict='TPOT light')

4.用户自定义config

tpot_config = {
    'sklearn.naive_bayes.GaussianNB': {
    },

    'sklearn.naive_bayes.BernoulliNB': {
        'alpha': [1e-3, 1e-2, 1e-1, 1., 10., 100.],
        'fit_prior': [True, False]
    },

    'sklearn.naive_bayes.MultinomialNB': {
        'alpha': [1e-3, 1e-2, 1e-1, 1., 10., 100.],
        'fit_prior': [True, False]
    }
}
custom_pipeline_optimezer_regressor  =TPOTRegressor(generations=5,population_size=5,cv=5,random_state=20,
                                                      verbosity=1,config_dict=tpot_config)

 5.分布式环境训练

from sklearn.externals import joblib
import distributed.joblib
from dask.distributed import Client

# connect to the cluster
client = Client('schedueler-address')

# create the estimator normally
estimator = TPOTClassifier(n_jobs=-1)

# perform the fit in this context manager
with joblib.parallel_backend("dask"):
    estimator.fit(X, y)

 6.实际项目

项目目标是预测下游水库的进水量,其源数据内容如下

第一列是下游水库的进水量,第二列是上游水库的出水量,其余的是上下游之间降雨观测点的雨量信息 

猜你喜欢

转载自www.cnblogs.com/54hys/p/10740913.html