Automatic machine learningAutoML

AutoML

Tasks such as model selection and hyperparameter adjustment are tedious tasks for developers of machine learning algorithms. In order to enable machines to automatically design and tune models, automatic machine learning AutoML came into being.

Automated machine learning, also known as automated ML or AutoML, is the process of automating time-consuming, repetitive tasks in the development of machine learning models. Its main function is to lower the threshold for building and deploying machine learning models. By automating a series of tasks, such as feature selection, model selection, hyperparameter tuning, etc., AutoML can effectively reduce the need for manual intervention and improve the accuracy of the model. performance.

AutoML is mainly based on Neural Architecture Search (NAS). This algorithm realizes automatic generation of models, which mainly involves search space and search strategy, and then automatically evaluates the performance of the generated model. Currently, NAS is widely used in various models of image classification, target detection, speech recognition, and natural language processing.

  • Search space definition: NAS first defines a search space that contains all possible neural network structures. This can include different types of layers, how the layers are connected to each other, the width and depth of each layer, etc.
  • Search strategy: NAS uses search algorithms, usually based on heuristics or optimization methods, to find the optimal network structure in the search space. These algorithms can be based on genetic algorithms, reinforcement learning, evolutionary algorithms, Bayesian optimization, etc.
  • Evaluation and Update: For each searched network structure, its performance needs to be evaluated. This is usually done by training a neural network and evaluating it on a validation set. After performance evaluation, the weights of the search algorithm are updated based on the evaluation results to optimize the next round of search.
  • Accelerate searches: Since the search space of NAS is large, the search process can be very expensive. Therefore, researchers have proposed some methods to speed up search, such as by sharing parameters, adopting model scaling, using surrogate models, etc.

Using AutoML typically involves the following steps:

  1. Data preparation: Prepare the data sets for training and evaluation.
  2. Choose an AutoML tool: Choose an AutoML tool or platform that's right for your task. Some common AutoML tools include Auto-Sklearn, AutoKeras, AutoGluon, Google AutoML, Azure and Auto-PyTorch, etc.
  3. Configure search space: Define the search space for the model and hyperparameters. This may involve choosing an appropriate model type, setting ranges for hyperparameters, etc.
  4. Run AutoML: Start the AutoML tool and let it automatically search for the best model and hyperparameter combination in the defined search space.
  5. Model evaluation: Evaluate the model obtained by AutoML to understand its performance. This usually involves using a validation set or cross-validation to check the model's ability to generalize.
  6. Deploy the model: Once satisfied, deploy the AutoML-generated model to a production environment for actual predictions.

The following is an introduction to five AutoML tools including Auto-Sklearn, AutoKeras, AutoGluon, Google AutoML and Azure automatic machine learning.

Auto-Sklearn

Auto-Sklearn is a popular AutoML tool that is built on the Scikit-Learn library and provides automated model selection and tuning. In actual use, more detailed configuration may be required based on the characteristics of the data, such as selecting different metrics, adjusting the search space of the model, etc. Auto-Sklearn provides many configuration options, allowing users to customize the automation process more flexibly. Additionally, model details can be accessed through Auto-Sklearn’s API to better understand the best model to choose. For more details, please refer to the official Github website .

Insert image description here

Python code example: Use Auto-Sklearn to classify a dataset of handwritten digits (Digits), which can be replaced with your own dataset based on your task. Please note that the time_left_for_this_task and per_run_time_limit parameters are used to limit the running time of Auto-Sklearn.

# 安装Auto-Sklearn
!pip install auto-sklearn

# 导入必要的库
import autosklearn.classification
from sklearn.model_selection import train_test_split
from sklearn import datasets

# 加载示例数据集(如果没有,可以替换为自己的数据集)
X, y = datasets.load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# 创建并训练Auto-Sklearn分类器
automl_classifier = autosklearn.classification.AutoSklearnClassifier(time_left_for_this_task=120, per_run_time_limit=30)
automl_classifier.fit(X_train, y_train)

# 在测试集上评估模型性能
accuracy = automl_classifier.score(X_test, y_test)
print(f"Accuracy: {
      
      accuracy}")

# 获取Auto-Sklearn所选用的模型信息
print(automl_classifier.show_models())

AutoKeras

AutoKeras is an open source AutoML framework based on Keras, developed by Texas A&M University. It is easy to get started and you can try various neural network structures. For more details, please refer to the official Github website .

A simple example of Python code:

# 安装autokeras
pip install autokeras

# 导入autokeras库
import autokeras as ak

# 创建模型
clf = ak.ImageClassifier()

# 模型训练
clf.fit(x_train, y_train)

# 模型测试
results = clf.predict(x_test)

AutoGluon

AutoGluon is another open source AutoML framework developed by the Apache MXNet community (Amazon). It is very easy to get started and has powerful functions. It is designed to simplify the training and deployment process of machine learning models, allowing users to easily build high-performance models without the need for deep expertise. It is different from other AutoML frameworks that focus more on model and hyperparameter selection, AutoGluon can achieve better performance by integrating multiple models and stacking them in multiple layers. For more details, please refer to the official Github website .

Python code example: Suppose you have a tabular data set for binary classification, where the class column is the target label. AutoGluon automatically selects an appropriate model and optimizes hyperparameters, then returns performance metrics and prediction results on the test set.

# 安装AutoGluon
!pip install autogluon

# 导入AutoGluon相关模块
from autogluon import TabularPrediction as task

# 加载示例数据集(如果没有,可以替换为自己的数据集)
train_data = task.Dataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
test_data = task.Dataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')

# 定义任务类型为二分类
predictor = task.fit(train_data=train_data, label='class', problem_type='binary')

# 在测试集上评估模型性能
performance = predictor.evaluate(test_data)

# 打印性能指标
print(performance)

# 进行预测
predictions = predictor.predict(test_data)
print(predictions)

Google AutoML

Google Cloud AutoML is a service on the Google Cloud platform that provides powerful automated machine learning capabilities. Please note that using Google AutoML requires a Google Cloud account and fees may be involved. Here are the development steps:

  1. Create a Google Cloud project: Make sure you create a project on Google Cloud and enable the AutoML API.
  2. Upload your data: In the Google Cloud Console, navigate to the AutoML Vision page and upload the dataset containing images and labels. Follow the instructions to upload and tag images.
  3. Train the model: After uploading the data, select the model configuration and hyperparameters, and then start training the model.
  4. Wait for model training to complete: Depending on the size of the data set, model training may take some time. You can track training progress on the Google Cloud Console.
  5. Evaluate the model: After training is completed, you can evaluate the performance of the model on the AutoML Vision page and view indicators such as confusion matrix and accuracy.
  6. Use the model to make predictions: Once you are satisfied, you can use the trained model to make predictions. An online API is available on the Google Cloud Console, or you can download the model and deploy it locally.

Here is a simplified Python example demonstrating how to use Google AutoML Vision for an image classification task.

from google.cloud import automl_v1beta1
from google.cloud.automl_v1beta1.proto import service_pb2

# 替换为你的项目ID、模型ID和文件路径
project_id = "your-project-id"
model_id = "your-model-id"
file_path = "path/to/your/image.jpg"

# 创建AutoML客户端
client = automl_v1beta1.AutoMlClient()

# 构建模型路径
model_full_id = f"projects/{
      
      project_id}/locations/us-central1/models/{
      
      model_id}"

# 读取图像文件
with open(file_path, "rb") as content_file:
    content = content_file.read()

# 构建图像内容
image = automl_v1beta1.Image(image_bytes=content)

# 构建预测请求
payload = service_pb2.PredictRequest.Params()
payload.image.image_bytes = content
payload = {
    
    "image": payload}
request = automl_v1beta1.PredictRequest(name=model_full_id, payload=payload)

# 发送预测请求
response = client.predict(request=request)

# 解析预测结果
for result in response.payload:
    print(f"Predicted class: {
      
      result.display_name}")
    print(f"Confidence: {
      
      result.classification.score}")

Azure automated machine learning

Azure Automatic Machine Learning is an AutoML framework developed by Microsoft. As shown in the figure, during training, Azure Automated Machine Learning creates multiple parallel pipelines that try different algorithms and parameters. The service iterates an ML algorithm paired with feature selection, producing a model with a training score for each iteration. The better the score for the metric being optimized, the better the model is considered to "fit" the data. Machine learning stops once the exit conditions defined in the experiment are reached.
Insert image description here
The Azure automatic machine learning framework can be applied to classification, regression, time series prediction, computer vision and natural language processing. in:

  • The main goal of a classification model is to predict which categories new data will belong to based on the experience gained from its training data. Common classification examples include fraud detection, handwriting recognition, and object detection.
  • Regression models predict numerical output values ​​based on independent predictors. In regression, the goal is to help establish the relationship between these independent predictor variables by estimating the effect of one variable on other variables. For example, predict car prices based on characteristics such as fuel consumption per mile and safety ratings.
  • Time series forecasting is considered as a multiple regression problem. Past time series values ​​are "pivoted" into additional dimensions for regressors and other predictors. Unlike traditional sequential methods, the advantage of this method is that multiple contextual variables and their interrelationships are naturally included in the training process. Automated ML learns a single model, often with internal branches, for all projects within the dataset and forecast timeframe. This allows more data to be used to estimate model parameters, making generalization to unknown series possible. Advanced forecasting configurations include:
    (1) Holiday detection and characterization
    (2) Timing and DNN trainers (Auto-ARIMA, Prophet, ForecastTCN)
    (3) Multi-model support via grouping
    (4) Rolling origin cross-validation
    (5) Configurable hysteresis
    (6) rolling window aggregation feature
  • Computer vision can support multi-class image classification, multi-label image classification, object detection and instance segmentation tasks. And works at scale, leveraging Azure Machine Learning MLOps and ML pipeline capabilities.
  • Automated ML for natural language processing allows you to easily generate models trained on text data for text classification and named entity recognition scenarios. It provides multi-language support for 104 languages ​​and supports Horovod for distributed training.

Automated machine learning supports ensemble models, which are enabled by default. Ensemble learning improves machine learning results and predictive performance by combining multiple models rather than using a single model. Ensemble iterations are shown as the last iteration of . Automated machine learning uses voting and stacked ensemble methods to combine models:

  • Voting: Make predictions based on a weighted average of predicted class probabilities (for classification tasks) or predicted regression targets (for regression tasks).
  • Stacking: The stacking method combines heterogeneous models and trains a meta-model based on the output of each model. The current default meta-models are LogisticRegression (for classification tasks) and ElasticNet (for regression/prediction tasks).

Additionally, in Azure Automated Machine Learning, scaling and normalization techniques are applied to simplify feature engineering (feature engineering is the process of using data domain knowledge to create features that help optimize the learning effect of machine learning algorithms).

The above content about Azure automatic machine learning refers to the official website . Here we only introduce its related principles and functions. Go to the official website to view more details, such as how to install Azure, data preprocessing, model training and testing, and model deployment. , visualization and monitoring models, etc., with detailed code examples provided.

Guess you like

Origin blog.csdn.net/weixin_43603658/article/details/134724323