PaddleX trains an image classification model of a custom data set, and the whole process is opened up (from environment construction to inference deployment)

PaddleX trains an image classification model of a custom data set, and the whole process is opened up (from environment construction to inference deployment)

Written at the top: PaddleX is a relatively easy-to-use and deployment-friendly warehouse launched by paddlepaddle. The interface is better. It has a nanny-level tutorial on github, but because of too much content, it brings together classification, segmentation, and detection. The models are in one piece, and it is especially easy to get lost in it. This tutorial is for those who don't know much about PaddleX to use PaddleX.

PaddleX official warehouse address

1. Environmental preparation

Before starting data training, it is necessary to build a deep learning environment for the equipment used. The following uses the ubuntu20.04 system as an example.
1. Graphics card driver installation tutorial, it is recommended to use the additional driver in the software update for one-click installation
2. Cuda and cudnn installation, pay attention to the version of cuda and cudnn must be corresponding
3. Anaconda installation
4. Use Anaconda to create a virtual environment
5. Enter the newly created virtual environment and start installing paddlepaddle. Note that the cuda version and operating system must be the same as those just installed. It is recommended to use the pip that has changed the source for installation. 6. Install
PaddleX

pip install paddlex==2.1.0 -i https://mirror.baidu.com/pypi/simple

paddlepaddle has integrated the pycocotools package, but there are cases where pycocotools cannot be successfully installed with paddlepaddle. Because PaddleX depends on the pycocotools package, if the installation of pycocotools fails, you can install pycocotools as follows:

pip install cython  
pip install pycocotools

So far, the environment has basically been built.

2. Data preparation

When training an image classification model, the data set needs to be prepared in the format of ImageNet. The format requirements are as follows. If there are N categories, use N folders:

dataset/ # 图像分类数据集根目录
|--dog/ # 当前文件夹所有图片属于dog类别
|  |--d1.jpg
|  |--d2.jpg
|  |--...
|  |--...
|
|--...
|
|--snake/ # 当前文件夹所有图片属于snake类别
|  |--s1.jpg
|  |--s2.jpg
|  |--...
|  |--...

Use the PaddleX command to randomly divide the dataset into 70% training set, 20% validation set and 10% test set; –dataset_dir is a relative path here, if an error is reported and the folder cannot be found, it is recommended to use an absolute path; –val_value 0.2 and –test_value 0.1 can be adjusted appropriately according to your own ideas.

paddlex --split_dataset --format ImageNet --dataset_dir dataset --val_value 0.2 --test_value 0.1

3. Model training

Create a new train.py in the dataset/same-level directory, copy the following code into it; then run python train.py to start training.

import paddlex as pdx
from paddlex import transforms as T


# 定义训练和验证时的transforms
# API说明:https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/transforms/transforms.md
train_transforms = T.Compose(
    [T.RandomCrop(crop_size=224), T.RandomHorizontalFlip(), T.Normalize()])

eval_transforms = T.Compose([
    T.ResizeByShort(short_size=256), T.CenterCrop(crop_size=224), T.Normalize()
])

# 定义训练和验证所用的数据集
# API说明:https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/datasets.md
train_dataset = pdx.datasets.ImageNet(
    data_dir='dataset',
    file_list='dataset/train_list.txt',
    label_list='dataset/labels.txt',
    transforms=train_transforms,
    shuffle=True)

eval_dataset = pdx.datasets.ImageNet(
    data_dir='dataset',
    file_list='dataset/val_list.txt',
    label_list='dataset/labels.txt',
    transforms=eval_transforms)

# 初始化模型,并进行训练
num_classes = len(train_dataset.labels)
model = pdx.cls.MobileNetV3_small(num_classes=num_classes)

# 模型训练参数
# 各参数介绍与调整说明:https://github.com/PaddlePaddle/PaddleX/tree/develop/docs/parameters.md
model.train(
    num_epochs=200,
    train_dataset=train_dataset,
    train_batch_size=32,
    eval_dataset=eval_dataset,
    lr_decay_epochs=[130, 160, 180],
    learning_rate=0.01,
    save_dir='output/mobilenetv3_small',
    use_vdl=True)

If you need to use multiple GPU cards for training, for example, when using 2 cards, execute:

python -m paddle.distributed.launch --gpus 0,1 train.py

4. Visual indicators

In the training code, if use_vdl is set to True, the training process will automatically save the training log in VisualDL format in the vdl_log directory under save_dir (the path specified by the user), and the user can use the following command to start the VisualDL service to view the visualization indicators

visualdl --logdir output/mobilenetv3_small/vdl_log --port 6001

After the service starts, use a browser to open https://0.0.0.0:6001 or https://localhost:6001

5. Model prediction

Use PaddleX's built-in prediction interface to predict the result

import paddlex as pdx
test_jpg = 'test.jpg'
model = pdx.load_model('output/mobilenetv3_small/best_model')
result = model.predict(test_jpg)
print("Predict Result: ", result)

6. Model Deployment

Training model format
In the model folder saved by PaddleX training, there are mainly four files:

model.pdopt, optimizer for training model parameters
model.pdparams, model parameters
model.yml, model configuration files (including preprocessing parameters, model definitions, etc.)
eval_details.json, prediction results and true values ​​​​at model evaluation

It should be noted that the model saved during training cannot be directly used for deployment, it needs to be exported to a deployment format before it can be used for deployment.
Use the following command on the command line terminal to export the trained model to the format required for deployment:

paddlex --export_inference --model_dir=./output/mobilenetv3_small/best_model/ --save_dir=./inference_model

Deployment model format
When deploying the model on the server side, the model saved during the training process needs to be exported as an inference format model. The inference format model exported using PaddleX 2.0 includes five files:

model.pdmodel, model network structure
model.pdiparams, model weight
model.pdiparams.info, model weight name
model.yml, model configuration file (including preprocessing parameters, model definition, etc.)
pipeline.yml, which can be used in PaddleX Manufacture SDK process configuration file

After the deployment model is exported, it can be deployed using the PaddleX high-performance inference interface. The code is as follows:

import paddlex as pdx
predictor = pdx.deploy.Predictor('./inference_model')
result = predictor.predict(img_file='test.jpg')

Guess you like

Origin blog.csdn.net/weixin_45921929/article/details/128271831