[Recommended] 5 most frequently used AutoML frameworks

Hello everyone, the tasks performed by AutoML can be summarized as follows:

  • Preprocess and clean data
  • Select and build appropriate features
  • Choose the right model
  • Optimize model hyperparameters
  • Design the topology of the neural network (if using deep learning)
  • Postprocessing of Machine Learning Models
  • Visualization and presentation of results

In this article, I will sort out the 5 most common and well-known open source AutoML libraries or frameworks for you. I like to remember to bookmark, like, and follow.

  • Auto-Sklearn
  • TPOT
  • Hyperopt Sklearn
  • Auto-Hard
  • H2O AutoML

[Note] Code information, technical documents, and technical exchanges can be obtained at the end of the article

1、Auto-Sklearn

Auto-sklearn is an out-of-the-box automated machine learning library. auto-sklearn builds on scikit-learn and automatically searches for the right learning algorithm and optimizes its hyperparameters. The best data processing pipelines and models can be obtained through searches such as meta-learning, Bayesian optimization, and ensemble learning. It can handle most of the tedious work, such as preprocessing and feature engineering techniques: One-Hot encoding, feature normalization, dimensionality reduction, etc.

Install:

#pip
pip install auto-sklearn
#conda
conda install -c conda-forge auto-sklearn

Because of a lot of encapsulation, the method sklearn used is basically the same, the following is the sample code:

import sklearn.datasets
import sklearn.metrics
import autosklearn.regression
import matplotlib.pyplot as plt
X, y = sklearn.datasets.load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, random_state=1)
automl = autosklearn.regression.AutoSklearnRegressor(
    time_left_for_this_task=120,
    per_run_time_limit=30,
    tmp_folder='/tmp/autosklearn_regression_example_tmp',
)
automl.fit(X_train, y_train, dataset_name='diabetes')

Code address: https://github.com/automl/auto-sklearn

2、TPOT

TPOT (Tree-based Pipeline Optimization Tool) is a Python automated machine learning tool that uses genetic algorithm optimization to optimize the process of machine learning. It is also based on the methods provided by Scikit-Learn for data transformation and machine learning model building, but it uses genetic algorithm programming for random and global search. The following is the TPOT search process:

insert image description here

Install:

#pip
pip insall tpot
#conda
conda install -c conda-forge tpot

Sample code:

from tpot import TPOTClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import numpy as np

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data.astype(np.float64),
    iris.target.astype(np.float64), train_size=0.75, test_size=0.25, random_state=42)

tpot = TPOTClassifier(generations=5, population_size=50, verbosity=2, random_state=42)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_iris_pipeline.py')

Code address: https://github.com/EpistasisLab/tpot

3、HyperOpt-Sklearn:

HyperOpt-Sklearn is a wrapper for HyperOpt that integrates AutoML and HyperOpt with Scikit-Learn. This library contains data preprocessing transformation and classification, and regression algorithm models. The documentation says: It is designed for large-scale optimization of models with hundreds of parameters and allows to scale the optimization process across multiple cores and multiple machines.

Install:

pip install hyperopt

Sample code:

from pandas import read_csv
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from hpsklearn import HyperoptEstimator
from hpsklearn import any_regressor
from hpsklearn import any_preprocessing
from hyperopt import tpe
# load dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data.astype(np.float64),
    iris.target.astype(np.float64), train_size=0.75, test_size=0.25, random_state=42)
model = HyperoptEstimator(regressor=any_regressor('reg'), preprocessing=any_preprocessing('pre'), loss_fn=mean_absolute_error, algo=tpe.suggest, max_evals=50, trial_timeout=30)
model.fit(X_train, y_train)
# summarize performance
mae = model.score(X_test, y_test)
print("MAE: %.3f" % mae)
# summarize the best model
print(model.best_model())

Code address: https://github.com/hyperopt/hyperopt-sklearn

4 、 AutoKeras

AutoKeras is a Keras-based AutoML system that delivers the power of Neural Architecture Search (NAS) with just a few lines of code. It was developed by the DATA lab at Texas A&M University and implemented based on TensorFlow's tf.keras API and Keras.

AutoKeras can support different tasks such as image classification, structured data classification or regression, etc.

Install:

pip install autokeras

Sample code:

import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import mnist
import autokeras as ak
#Load dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(x_train.shape)  # (60000, 28, 28)
print(y_train.shape)  # (60000,)
print(y_train[:3])  # array([7, 2, 1], dtype=uint8)

# Initialize the image classifier.
clf = ak.ImageClassifier(overwrite=True, max_trials=1)
# Feed the image classifier with training data.
clf.fit(x_train, y_train, epochs=10)

# Predict with the best model.
predicted_y = clf.predict(x_test)
print(predicted_y)
# Evaluate the best model with testing data.
print(clf.evaluate(x_test, y_test))

Code address: https://github.com/keras-team/autokeras

5 H2O AutoML

H2O's AutoML can be used to automatically train and tune many models within a user-specified time limit.

H2O provides a number of interpretability methods for AutoML objects (groups of models) as well as individual models. Explanations can be automatically generated and provide a simple interface to explore and interpret AutoML models.

Install:

pip insall h2o

H2O can be said in more detail as a distributed machine learning platform, so it is necessary to establish a H2O cluster. This part of the code is developed using java, and it needs to install jdk support.

After the installation of JAVA is completed, and the environment variable is set to the java path, execute the following command in cmd:

java -jar path_to/h2o.jar

You can start the H2O cluster and operate through the web interface. If you want to write in Python code, you can use the following example

import h2o
h2o.init()
from h2o.automl import H2OAutoML
churn_df = h2o.import_file('https://raw.githubusercontent.com/srivatsan88/YouTubeLI/master/dataset/WA_Fn-UseC_-Telco-Customer-Churn.csv')
churn_df.types
churn_df.describe()
churn_train,churn_test,churn_valid = churn_df.split_frame(ratios=[.7, .15])
churn_train
y = "Churn"
x = churn_df.columns
x.remove(y)
x.remove("customerID")
aml = H2OAutoML(max_models = 10, seed = 10, exclude_algos = ["StackedEnsemble", "DeepLearning"], verbosity="info", nfolds=0)
!nvidia-smi
aml.train(x = x, y = y, training_frame = churn_train, validation_frame=churn_valid)

lb = aml.leaderboard
lb.head()
churn_pred=aml.leader.predict(churn_test)
churn_pred.head()
aml.leader.model_performance(churn_test)
model_ids = list(aml.leaderboard['model_id'].as_data_frame().iloc[:,0])
#se = h2o.get_model([mid for mid in model_ids if "StackedEnsemble_AllModels" in mid][0])
#metalearner = h2o.get_model(se.metalearner()['name'])
model_ids
h2o.get_model([mid for mid in model_ids if "XGBoost" in mid][0])
out = h2o.get_model([mid for mid in model_ids if "XGBoost" in mid][0])
out.params
out.convert_H2OXGBoostParams_2_XGBoostParams()
out
out_gbm = h2o.get_model([mid for mid in model_ids if "GBM" in mid][0])
out.confusion_matrix()
out.varimp_plot()
aml.leader.download_mojo(path = "./")

Code address: https://github.com/h2oai/h2o-3

Summarize

In this article, we summarize 5 AutoML libraries and how it examines the automation of machine learning for tasks such as data preprocessing, hyperparameter tuning, model selection, and evaluation.

In addition to these 5 common libraries, there are some other AutoML libraries, such as AutoGluon, MLBoX, TransmogrifAI, Auto-WEKA, AdaNet, MLjar, TransmogrifAI, Azure Machine Learning, Ludwig, etc.

recommended article

Technology Exchange

Welcome to reprint, collect, like and support!

insert image description here

At present, a technical exchange group has been opened, and the group has more than 2,000 members . The best way to remark when adding is: source + interest direction, which is convenient to find like-minded friends

  • Method 1. Send the following picture to WeChat, long press to identify, and reply in the background: add group;
  • Method ②, add micro-signal: dkl88191 , note: from CSDN
  • Method ③, WeChat search public account: Python learning and data mining , background reply: add group

long press follow

Guess you like

Origin blog.csdn.net/weixin_38037405/article/details/124092606