PaddlePaddle upgrade interpret | a dozen lines of code to complete the migration study

Transfer learning (Transfer Learning) is a sub-field of study belongs to the depth of learning, the goal of the research is to use the data, task, or similarity between the model and the knowledge learned in the old areas, migration applied to new fields . Transfer learning has attracted many researchers to join them, because it can be a good solution to the following questions in depth study:

 

  • Some research fields marked only a small amount of data, and the data marked the high cost, not enough to train a neural network robust enough

  • Large-scale training of the neural network depends on a lot of computing resources, it is difficult to achieve for the average user

  • For the model needs to be universal, and performance on a particular application is not satisfactory

 

To allow developers to more easily learn to migrate applications, Baidu PaddlePaddle pre-training model of open source management tools PaddleHub. Developers with only a dozen lines of code to use, will be able to complete the migration study. This article will introduce the reader a comprehensive PaddleHub and methods for their use.

 

Project address: https: //github.com/PaddlePaddle/PaddleHub

 

PaddleHub Introduction

 

PaddleHub is based PaddlePaddle development of pre-training model management tools, can make use of pre-trained model is more convenient to carry out the migration study and work, designed to allow developers to more easily under PaddlePaddle to experience large-scale ecological value of the pre-training model.

 

PaddleHub current pre-training model covers the image classification, object detection, lexical analysis, Transformer, sentiment analysis of the five categories. The future will continue to open more types of deep learning models, such as language model, video classification, image generation and other pre-training model. PaddleHub panorama function shown in Figure 1.

 

FIG 1 PaddleHub features panoramic

 

PaddleHub consists of two functions: command-line tools and Fine-tune API.

 

Command-line tool

 

PaddleHub draws on ideas such as Anaconda and PIP package management, the development of command-line tools, you can easily and quickly complete the search model, download, install, prediction function, the corresponding key commands are search, download, install, run Wait. We run the command, for example, to predict how the command-line tool.

 

Module for executing the Run command prediction, where an example of a NLP and CV respectively move.

 

For NLP tasks: input data designated by --input_text. LAC model to Baidu (Chinese lexical analysis), for example, a single line of text analysis can be achieved by following commands.

 

# 单文本预测
$ hub run lac --input_text "今天是个好日子"

 

For CV task: input data designated by --input_path. With SSD model (single-stage target detection) as an example, a single image can be achieved by following commands forecast

 

# 使用SSD检测模型对图片进行目标检测,第一条命令是下载图片,第二条命令是执行预测,用户也可以自
# 己准备图片
$ wget --no-check-certificate https://paddlehub.bj.bcebos.com/resources/test_img_bird.jpg
$ hub run ssd_mobilenet_v1_pascal --input_path test_img_bird.jpg

 

More command usage, please refer the reader to Github project links the text of the first.

 

Fine-tune API

 

PaddleHub provided based on Fine-tune API PaddlePaddle implemented, focusing on higher-order abstractions done for Fine-tune large-scale pre-mission training model, so that pre-training model to better serve user-specific application scenarios. Through large-scale pre-training model combines Fine-tune, you can complete the convergence model in less time and with better generalization ability. Panoramic PaddleHub API is shown in Fig.

 

FIG 2 PaddleHub Fine-tune API panoramic

 

  • Fine-tune: Task to perform a Fine-tune, evaluation and validation set regularly. In the process of Fine-tune, the interface periodically saved checkpoint (model and operating data), when the run was interrupted by the checkpoint directory on RunConfig designated the first run, we can restore the status directly from the last assessment on a running continue to run.

  • Migration tasks Task: In PaddleHub in, Task represents a Fine-tune tasks. Tasks included the implementation of the tasks related to the program as well as a number of related tasks and metrics (such as classification accuracy rate of accuracy, precision, recall, F1-score, etc.), model losses.

  • Run the configuration RunConfig: In PaddleHub in, RunConfig represents the running are configured when the Task conduct of Fine-tune. Including running times epoch, the size of the batch, and whether to use the GPU training.

  • Optimization strategy Strategy: In PaddleHub in, Strategy class encapsulates a series Fine-tune policy applies to the migration study. Strategy contains a pre-training parameter learning what changes in policy rates, which type of optimizer use, what type of regularization and so on.

  • Pre-training model Module: Module represents an executable model. Herein refers to the executable, may Module1 hub run $ {MODULE_NAME} prediction is performed for Fine-tune, or get the context through the context interface to direct the command line. When generating a Module, supports the creation Module by name, url or path.

  • Data preprocessing Reader: PaddleHub data preprocessing module Reader for common NLP tasks are abstract and CV.

  • Dataset Dataset: PaddleHub data set provides more tasks and CV NLP task for the user to upload, the user can complete Fine-tune on a custom data sets.

 

Based PaddleHub two functions described above, users can:

 

  • Without writing code, using a key pre-training model to predict;

  • By hub download command, quick access to all the pre-training model under PaddlePaddle ecology;

  • With PaddleHub Fine-tune API, use a small amount of code to complete the migration study.

 

The following from the actual point of view, to teach you how to use PaddleHub image classification migration.

 

PaddleHub combat

 

1. Install

 

PaddleHub is based on pre-trained model management framework PaddlePaddle before use PaddleHub need to install PaddlePaddle, if you have a local version of the CPU or GPU PaddlePaddle installed, you can skip the following installation steps.

 

$ pip install paddlepaddle #CPU 安装命令
或者
$ pip install paddlepaddle-gpu # GPU 安装

 

It is recommended to use version 1.4.0 of PaddlePaddle.

 

By the following command to install PaddleHub

 

$ pip install paddlehub

 

2. Select the appropriate model

 

First import the necessary package python

 

# -*- coding: utf8 -*-
import paddlehub as hub
import paddle.fluid as fluid

 

Next we have to choose the right pre-trained models to Fine-tune in PaddleHub, because cats and dogs image classification is a classification task, we use the classic ResNet-50 as a pre-training model. PaddleHub image classification provides a wealth of pre-trained models, including the latest PNASNet neural network architecture to search for classes, we recommend that you try different pre-trained models to obtain better performance.

 

module_map = {
    "resnet50": "resnet_v2_50_imagenet",
    "resnet101": "resnet_v2_101_imagenet",
    "resnet152": "resnet_v2_152_imagenet",
    "mobilenet": "mobilenet_v2_imagenet",
    "nasnet": "nasnet_imagenet",
    "pnasnet": "pnasnet_imagenet"
}

module_name = module_map["resnet50"]
module = hub.Module(name = module_name)

 

3. Data Preparation

 

Then you need to load the picture data set. In order to quickly experience, we loaded directly cats and dogs classified data set PaddleHub provided, if you want to use a custom data experience, please view the custom data.

 

# 直接用PaddleHub提供的数据集
dataset = hub.dataset.DogCat()

 

4. Custom Data

 

This section explains how to assemble a custom data, if you want to use cats and dogs dataset experience, you can skip this section.

 

When using custom data, we need our own partitioned data sets, the data set and is divided into a training set, validation and test sets.

 

Simultaneously recording three text files corresponding to the image path and labels, is also a need for a label file name record label.

 

├─data: 数据目录
  ├─train_list.txt:训练集数据列表
  ├─test_list.txt:测试集数据列表
  ├─validate_list.txt:验证集数据列表
 ├─label_list.txt:标签列表
  └─……

 

Format training / validation / test set of data as a list of files

 

图片 1 路径 图片 1 标签
图片 2 路径 图片 2 标签
...

 

Tag list file has the following format

 

分类 1 名称
分类 2 名称
...

 

Using the following way to load data to generate a data set object

 

Precautions:

 

  1. num_labels to fill in the actual number of classification, the classification field value such as cats and dogs 2, food101 the field value 101, as an example hereinafter to 2

  2. base_path actual path data set, require the full path to hereinafter / test / data as an example

  3. Image Path training / validation / test data set in the list of files required base_path relative path, for example the actual position of the image of /test/data/dog/dog1.jpg,base_path / test / data, then the file fill the path should be dog / dog1.jpg

 

# 使用本地数据集
class MyDataSet(hub.dataset.base_cv_dataset.ImageClassificationDataset):
 def __init__(self):
 self.base_path = "/test/data"
 self.train_list_file = "train_list.txt"
 self.test_list_file = "test_list.txt"
 self.validate_list_file = "validate_list.txt"
 self.label_list_file = "label_list.txt"
 self.label_list = None
 self.num_labels = 2

 

5. Generating Reader

 

Then generates a reader for image classification, the reader is responsible for preprocessing data dataset, and then enter the tissue and in a particular format to the model training.

 

When we generate a reader image classification, you need to specify the size of the input image

 

data_reader = hub.reader.ImageClassificationReader(
    image_width=module.get_expected_image_width(),
    image_height=module.get_expected_image_height(),
    images_mean=module.get_pretrained_images_mean(),
    images_std=module.get_pretrained_images_std(),
    dataset=dataset)

 

6. Fine-tune Task form

 

Once you have the right model pre-trained and ready to be migrated dataset, we began to set up a Task.

 

Because dogs and cats is a dichotomous classification task, and we downloaded cv_classifer_module is on the training data set ImageNet thousands classification model, so we need to fine-tune the model simple, the model is transformed into a dichotomous model:

 

  1. Get cv_classifer_module the context, including input and output variables, and Paddle Program;

  2. Wherein output variables found from FIG extraction layer feature_map;

  3. A full access connection layer behind feature_map, generating the Task;

 

input_dict, output_dict, program = module.context(trainable=True)

img = input_dict["image"]
feature_map = output_dict["feature_map"]

task = hub.create_img_cls_task(
    feature=feature_map, num_classes=dataset.num_labels)

feed_list = [img.name, task.variable("label").name]

 

7. Select Runtime Configuration

 

Fine-tune performed before, we can set the number of run-time configuration, for example, the code configuration is as follows, expressed:

 

  • use_cuda: False representation is set to use the CPU for training. If native support for GPU, and the installation of a GPU version of PaddlePaddle, we recommend that you set this option to True;

  • epoch: Fine-tune the requirements of the task only once traversed the training set;

  • batch_size: each training time, the size of the batch of data is input to the model 32, the batch can be processed in parallel model training data, thus batch_size, the higher the efficiency of training, but at the same time brings a memory load, is too large the batch_size may cause memory is insufficient for training, so choose a suitable batch_size is a very important step;

  • log_interval: print once every 10 step training logs;

  • eval_interval: once every 50 step performance evaluation on the validation set;

  • checkpoint_dir: Save the training parameters and data to cv_Fine-tune_turtorial_demo directory;

  • strategy: Use DefaultFine-tuneStrategy strategies Fine-tune;

 

More running configuration, please see the Github project links the first text.

 

config = hub.RunConfig(
    use_cuda=False,
    num_epoch=1,
    checkpoint_dir="cv_finetune_turtorial_demo",
    batch_size=32,
    log_interval=10,
    eval_interval=50,
    strategy=hub.finetune.strategy.DefaultFinetuneStrategy())

 

8. Start the Fine-tune

 

We chose Fine-tune_and_eval interface for model training, this interface during Fine-tune in, will periodically evaluate the effect of the model, so that we understand the performance changes throughout the training process.

 

hub.finetune_and_eval(
    task, feed_list=feed_list, data_reader=data_reader, config=config)

 

9. Check the effect of the training process

 

Training process performance data will be recorded locally, we can visualize these data by visualdl.

 

We enter the following command to start a shell visualdl, wherein $ {HOST_IP} oriented local IP, user-specified needs

 

$ visualdl --logdir ./ cv_finetune_turtorial_demo/vdllog --host ${HOST_IP} --port 8989

 

After starting the service, we use a browser to access $ {HOST_IP}: 8989, can see the training as well as loss curve and the accuracy of the prediction curve, as shown below.

 

 

10. The use of prediction models

 

When Fine-tune is complete, we use the model to predict the entire forecasting process can be divided into the following steps:

 

  1. Construction of the network

  2. Reader generates prediction data

  3. Switch to the predicted Program

  4. Pre-loaded trained parameters

  5. Run Program forecast

 

To get tested by the following command picture (data set suitable for cats and dogs classified)

 

$ wget --no-check-certificate https://PaddleHub.bj.bcebos.com/resources/test_img_cat.jpg
$ wget --no-check-certificate https://PaddleHub.bj.bcebos.com/resources/test_img_dog.jpg

 

Note: Other data sets used in the test images please prepare yourself.

 

Complete forecast code is as follows:

 

import os
import numpy as np
import paddle.fluid as fluid
import paddlehub as hub

# Step 1: build Program
module_map = {
    "resnet50": "resnet_v2_50_imagenet",
    "resnet101": "resnet_v2_101_imagenet",
    "resnet152": "resnet_v2_152_imagenet",
    "mobilenet": "mobilenet_v2_imagenet",
    "nasnet": "nasnet_imagenet",
    "pnasnet": "pnasnet_imagenet"
}

module_name = module_map["resnet50"]
module = hub.Module(name = module_name)
input_dict, output_dict, program = module.context(trainable=False)
img = input_dict["image"]
feature_map = output_dict["feature_map"]

dataset = hub.dataset.DogCat()
task = hub.create_img_cls_task(
    feature=feature_map, num_classes=dataset.num_labels)
feed_list = [img.name]

# Step 2: create data reader
data = [
    "test_img_dog.jpg",
    "test_img_cat.jpg"
]

data_reader = hub.reader.ImageClassificationReader(
    image_width=module.get_expected_image_width(),
    image_height=module.get_expected_image_height(),
    images_mean=module.get_pretrained_images_mean(),
    images_std=module.get_pretrained_images_std(),
    dataset=None)

predict_reader = data_reader.data_generator(
    phase="predict", batch_size=1, data=data)

label_dict = dataset.label_dict()

# Step 3: switch to inference program
with fluid.program_guard(task.inference_program()):
    # Step 4: load pretrained parameters
    place = fluid.CPUPlace()
    exe = fluid.Executor(place)
    pretrained_model_dir = os.path.join("cv_finetune_turtorial_demo", "best_model")
    fluid.io.load_persistables(exe, pretrained_model_dir)
    feeder = fluid.DataFeeder(feed_list=feed_list, place=place)
    # Step 5: predict
    for index, batch in enumerate(predict_reader()):
        result, = exe.run(
            feed=feeder.feed(batch), fetch_list=[task.variable('probs')])
        predict_result = np.argsort(result[0])[::-1][0]
        print("input %i is %s, and the predict result is %s" %
              (index+1, data[index], label_dict[predict_result]))

 

 

Depth study on WAVE SUMMIT2019 Developer Summit, PaddlePaddle released the "AI Studio one hundred million yuan calculate force support programs." Many have to spend Tesla v100 small partners continue to praise, looking forward to count for more power. In this small PP initiated ballot count on power, please select your preferred way to calculate the force get it!

Guess you like

Origin blog.csdn.net/PaddlePaddle/article/details/89886377