Automatic model selection

1, the core automation model selection problem

1.1 Search Space

Search space defines an alternative machine learning algorithms for classification or regression problems, such as KNN, SVM, k-means and so on.

1.2 Search Strategy

Search strategy defines how to use the algorithm to find the optimal model can accurately fast. Common methods include Bayesian search optimization, evolutionary algorithm.

2, automated model selection

2.1 Bayesian model selection based on automated optimization

2.1.1Auto-PACK

The machine learning algorithm specification to select a combination of algorithms and ultra miserable optimization (Combined Algorithm Selection and Hyper-parameter optimization, CASH) problem

(1) search space
Auto-WEKA contains 39 basic elements:

  • Base classifier 27, such as KNN, SVM, LR, etc.
  • Meta classifier 10, such AdaBoostM1, LogitBoost etc.
  • 2 ensemble methods, stacking Vote and
    wherein, optionally, one Meta classifier may classify groups as an input, the classifier ensemble can be used up to five groups classifiers as input
    data, the use of k-fold cross-validation

(2) search strategy
optimization algorithm CASH problem in two ways: Sequential Model-based Algorithm Configuration ( SMAC) and Tree-structured Parzen Estimator (TPE) , belong to SMBO algorithm ( Bayesian optimization algorithm )

2.1.2auto-sklearn

auot-sklearn 2015 was proposed, is a machine-based learning environment under the python package scikit-learn the AutoML machine learning framework
based on Auto-WEKA on Bayesian framework to optimize hyper-parameters AutoML added two component: during automatic integration element of learning and optimization models for initializing Bayesian optimizer.

(1) a search space
auto-sklearn 33 includes basic elements

  • 15 classification algorithm (KNN, AdaBoost, SVM, etc.)
  • 14 kinds of characteristic pretreatment methods (PCA, ICA the like)
  • 4 kinds of data preprocessing methods (one-hot enconding, imputation, balancing and rescaling)

(2) learning yuan
RMB to learn is to gain experience from the previous task, by inference across data sets learning algorithms to simulate these strategies. In the auto-sklearn, the application element selected instances of a given learning machine learning framework, these examples may perform well in the new data set. More particularly, for large data sets, wherein component a collection can be efficiently calculated characteristic data sets, and help determine which algorithm to use on the new data set. This meta-learning Bayesian optimization method is a supplement to optimize AutoML framework. Because Bayesian optimization is very slow at the beginning, but as time goes by slowly micro Amoy performance. A complementary effect may be generated by selecting the k-based learning element configuration and using their results to Bayesian optimization, and Bayesian optimization.

(3) Category Integrated
although Bayesian parameter optimization has a data efficiency in terms of parameters to find ultra-best performance, but this is actually a very wasteful process: it will all models in the training process is lost, usually including Some of the best model training situation. auto-sklearn recommended storage and use them to construct efficient overall processing method between them, instead of discarding these models.
Uniform weighting effect of the simple construction of building a model of the Bayesian optimization to find a set of not very good. Use predictive model retains all on a single set of weights to adjust these just important.

  • 栈(stacking)
  • Isocratic optimized value (gradient-free-numerical optimization)
  • Integration options (ensemble selection)

2.2 Automatic model selection based on evolutionary algorithms

In addition to Bayesian methods, we can use evolutionary algorithms to automatically select models, the most classic framework is TPOP (Tree-bases Pipeline Optimization Tool, tree-based pipeline optimization tool)

4 types of operations in conduit 2.2.1TPOP

1) pre-op data
2) wherein decomposition op
. 3) feature selection op
. 4) model selection op

2.2.2 tree-based integrated pipeline

In 2.2.3TPOP evolutionary algorithm to construct pipeline and Pareto optimization

For automatic generation and optimization of these tree-based pipeline, TPOT using evolutionary algorithm a new technology called Genetic Programming (the GP), implemented in the python DEAP. Traditionally, GP tree to construct mathematical functions optimized for a given standard. In TPOP, a GP and to the evolution of the sequence of parameters for each op op pipeline (e.g., the number of random forest trees or the number to be selected during the selection of the characteristic feature), to maximize the classification accuracy of the pipeline . Wherein the pipeline can be modified to change, remove or insert a new pipeline to OP sequences in a tree-based pipeline.
The pipeline TPOP final classification accuracy on the test set and to minimize overall complexity of the pipeline (i.e., the total number of pipeline OP)

3, automatic integrated learning

Integrated learning strategies combined:

  • Voting Act
  • Average method
  • Learning
    of H2O Ensemble allows users to supervised learning training model of integration

Guess you like

Origin www.cnblogs.com/world-0-1/p/11797381.html