Fine-tuning an image classification model with MMPreTrain

foreword

  • MMPreTrainis an PyTorchopen source deep learning pre-toolbox based on , and is OpenMMLaba member of the project
  • MMPreTrainThe main features of are:
    • Supports diverse backbone networks and pre-trained models
    • Support multiple training strategies (supervised learning, unsupervised learning, multimodal learning, etc.)
    • Provides a variety of training techniques
    • A large number of training configuration files
    • High Efficiency and High Scalability
    • Powerful toolbox to facilitate model analysis and experimentation
    • Supports multiple inference tasks out of the box
      • Image classification
      • Picture description (picture description)
      • Visual Question Answering
      • Visual Grounding
      • Search (picture search, picture search text, text search picture)
  • This article mainly demonstrates how to use , fine-tune the model MMPreTrainfor image classification tasksvision_transformer
  • The classification data uses the Animal Faces dataset kagglein the platform , and the operating environment iskaggle GPU P100

Environment installation

  • Because the interpretable analysis of the model is required, the installation grad-campackage is required, mmcvand the installation method has been written in my previous mmdetectionand mmsegmentationtutorials. no more mention here

  • mmpretrainThe installation method is best to use git, if there is no gittool, you can usemim install mmpretrain

  • checkpointFinally, create new , outputs, and datafolders under the project folder , which are used to store model pre-training weights, model output results, and training data respectively.

from IPython import display
!pip install "grad-cam>=1.3.6"
!pip install -U openmim
!pip install -q /kaggle/input/frozen-packages-mmdetection/mmcv-2.0.1-cp310-cp310-linux_x86_64.whl

!git clone https://github.com/open-mmlab/mmpretrain.git
%cd mmpretrain
!mim install -e .

!mkdir checkpoint
!mkdir outputs
!mkdir data

display.clear_output()
  • After the above installation work is completed, let's check the environment and check the installation version
import mmcv
from mmcv.ops import get_compiling_cuda_version, get_compiler_version
import mmpretrain
print('MMCV版本', mmcv.__version__)
print('mmpretrain版本', mmpretrain.__version__)
print('CUDA版本', get_compiling_cuda_version())
print('编译器版本', get_compiler_version())

output:

MMCV版本 2.0.1
mmpretrain版本 1.0.0
CUDA版本 11.8
编译器版本 GCC 11.3
  • Because mmpretrainit is suitable for many kinds of tasks, when fine-tuning the image classification model, first check which models support the task. You can call the list_modelsfunction to check, because we want to fine-tune vision_transformerthe model, we can add restrictions vitfor more precise screening
from mmpretrain import list_models, inference_model
list_models(task='Image Classification',pattern='vit')
  • vit-base-p32_in21k-pre_3rdparty_in1k-384pxHere we take the models in the candidate list as an example

model reasoning

  • Enter the project folder configs/vision_transformerto view the pre-training weights and configfiles corresponding to the model model

insert image description here

  • Download pre-trained weights and perform inference on example images
    Please add a picture description
from mmpretrain import ImageClassificationInferencer
# 待输入图像路径
img_path = 'mmpretrain/demo/bird.JPEG'
model = 'vit-base-p32_in21k-pre_3rdparty_in1k-384px'
# 预训练权重
pretrained = './checkpoints/vit-base-p32_in21k-pre-3rdparty_ft-64xb64_in1k-384_20210928-9cea8599.pth'
# 推理预测
inferencer = ImageClassificationInferencer(model=model, pretrained=pretrained, device='cuda:0')

result = inferencer('demo/bird.JPEG', show_dir="./visualize/")

display.clear_output()
  • View inference results
result[0].keys()

output :

dict_keys(['pred_scores', 'pred_label', 'pred_score', 'pred_class'])
  • Print the name of the category with the highest classification confidence, along with the confidence
# 置信度最高类别的名称
print(result[0]['pred_class'])
# 置信度最高类别的置信度
print('{:.3f}'.format(result[0]['pred_score']))
house finch, linnet, Carpodacus mexicanus
0.985

Fine-tuning the model

  • Move the data set datato the directory and prepare for training
# animal数据集移动
shutil.copytree('/kaggle/input/animal-faces/afhq', './data/animal')

Configuration file parsing

  • MMPreTrainThe configuration file is a bit different from the configuration file. When you open mmdetectionthe configuration file , you will find that only the data pipeline and processing method are explicitly defined in the configuration file.mmsegmentationvit-base-p32_in21k-pre_3rdparty_in1k-384pxvit-base-p32_64xb64_in1k-384px.py
  • But in fact, the data processing and optimizer parameters are implicitly _base_defined . For details, please refer to the code comments below.
_base_ = [
    '../_base_/models/vit-base-p32.py',					# 模型配置
    '../_base_/datasets/imagenet_bs64_pil_resize.py',	# 数据配置
    '../_base_/schedules/imagenet_bs4096_AdamW.py',		# 训练策略配置
    '../_base_/default_runtime.py'						# 默认运行设置
]

# model setting
# 输入图像大小
model = dict(backbone=dict(img_size=384))  

# dataset setting
# 输入的图片数据通道以 'RGB' 顺序
data_preprocessor = dict(
    mean=[127.5, 127.5, 127.5],		# 输入图像归一化的 RGB 通道均值
    std=[127.5, 127.5, 127.5],		# 输入图像归一化的 RGB 通道标准差
    to_rgb=True,					# 是否将通道翻转,从 BGR 转为 RGB 或者 RGB 转为 BGR
)

train_pipeline = [
    dict(type='LoadImageFromFile'),			# 读取图像
    dict(type='RandomResizedCrop', scale=384, backend='pillow'),	# 随机放缩裁剪
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),		# 随机水平翻转
    dict(type='PackInputs'),	# 准备图像以及标签
]

test_pipeline = [
    dict(type='LoadImageFromFile'),			# 读取图像
    dict(type='ResizeEdge', scale=384, edge='short', backend='pillow'),	# 缩放短边尺寸至384px
    dict(type='CenterCrop', crop_size=384),		# 中心裁剪
    dict(type='PackInputs'),		# 准备图像以及标签
]

train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
val_dataloader = dict(dataset=dict(pipeline=test_pipeline))
test_dataloader = dict(dataset=dict(pipeline=test_pipeline))

# schedule设定
optim_wrapper = dict(clip_grad=dict(max_norm=1.0))
  • Open ../_base_/models/vit-base-p32.pythe file to view the model configuration
model = dict(
    type='ImageClassifier',		# 主模型类型(对于图像分类任务,使用 `ImageClassifier`)
    backbone=dict(
        type='VisionTransformer',	# 主干网络类型
        arch='b',
        img_size=224,		# 输入模型图像大小
        patch_size=32,		# patch数
        drop_rate=0.1,		# dropout率
        init_cfg=[			# 初始化参数方式
            dict(
                type='Kaiming',
                layer='Conv2d',
                mode='fan_in',
                nonlinearity='linear')
        ]),
    neck=None,
    head=dict(
        type='VisionTransformerClsHead',		# 分类颈网络类型
        num_classes=1000,						# 分类数
        in_channels=768,						# 输入通道数
        loss=dict(type='CrossEntropyLoss', loss_weight=1.0),	# 损失函数配置信息
        topk=(1, 5),	# 评估指标,Top-k 准确率, 这里为 top1 与 top5 准确率
    ))
  • Open ../_base_/datasets/imagenet_bs64_pil_resize.pythe file to view the data configuration
dataset_type = 'ImageNet'		# 预处理配置
data_preprocessor = dict(
    num_classes=1000,
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    to_rgb=True,
)

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='RandomResizedCrop', scale=224, backend='pillow'),
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    dict(type='PackInputs'),
]

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='ResizeEdge', scale=256, edge='short', backend='pillow'),
    dict(type='CenterCrop', crop_size=224),
    dict(type='PackInputs'),
]

train_dataloader = dict(
    batch_size=64,		# 每张 GPU 的 batchsize
    num_workers=5,		# 每个 GPU 的线程数
    dataset=dict(		# 训练数据集
        type=dataset_type,
        data_root='data/imagenet',
        split='train',
        pipeline=train_pipeline),
    sampler=dict(type='DefaultSampler', shuffle=True),	# 默认采样器
)

val_dataloader = dict(
    batch_size=64,
    num_workers=5,
    dataset=dict(
        type=dataset_type,
        data_root='data/imagenet',
        split='val',
        pipeline=test_pipeline),
    sampler=dict(type='DefaultSampler', shuffle=False),
)

# 验证集评估设置,使用准确率为指标, 这里使用 topk1 以及 top5 准确率
val_evaluator = dict(type='Accuracy', topk=(1, 5))

test_dataloader = val_dataloader
test_evaluator = val_evaluator
  • Open ../_base_/schedules/imagenet_bs4096_AdamW.pythe file to view the training strategy configuration
optim_wrapper = dict(
    optimizer=dict(type='AdamW', lr=0.003, weight_decay=0.3),		# 使用AdamW优化器
    # vit预训练专用配置
    paramwise_cfg=dict(custom_keys={
    
    
        '.cls_token': dict(decay_mult=0.0),
        '.pos_embed': dict(decay_mult=0.0)
    }),
)

# 学习率策略
param_scheduler = [
    # 预热学习率调度器
    dict(
        type='LinearLR',
        start_factor=1e-4,
        by_epoch=True,
        begin=0,
        end=30,
        # 根据iter更新
        convert_to_iter_based=True),
    # 主要的学习策略
    dict(
        type='CosineAnnealingLR',
        T_max=270,
        by_epoch=True,
        begin=30,
        end=300,
    )
]

# train, val, test设置,max_epoch和验证频率
train_cfg = dict(by_epoch=True, max_epochs=300, val_interval=1)
val_cfg = dict()
test_cfg = dict()

auto_scale_lr = dict(base_batch_size=4096)
  • Open ../_base_/default_runtime.pythe file to see the default run settings
# 默认所有注册器使用的域
default_scope = 'mmpretrain'

# 配置默认的 hook
default_hooks = dict(
    # 记录每次迭代的时间
    timer=dict(type='IterTimerHook'),

    # 每 100 次迭代打印一次日志
    logger=dict(type='LoggerHook', interval=100),

    # 启用默认参数调度 hook
    param_scheduler=dict(type='ParamSchedulerHook'),

    # 每个 epoch 保存检查点
    checkpoint=dict(type='CheckpointHook', interval=1),

    # 在分布式环境中设置采样器种子
    sampler_seed=dict(type='DistSamplerSeedHook'),

    # 验证结果可视化,默认不启用,设置 True 时启用
    visualization=dict(type='VisualizationHook', enable=False),
)

# 配置环境
env_cfg = dict(
    # 是否开启 cudnn benchmark
    cudnn_benchmark=False,

    # 设置多进程参数
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),

    # 设置分布式参数
    dist_cfg=dict(backend='nccl'),
)

# 设置可视化工具
vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(type='UniversalVisualizer', vis_backends=vis_backends)

# 设置日志级别
log_level = 'INFO'

# 从哪个检查点加载
load_from = None

# 是否从加载的检查点恢复训练
resume = False

# 默认随机数种子
randomness = dict(seed=None, deterministic=False)

Modify the configuration file

  • According to the above instructions, here are two ways to modify the configuration file.
  • The first is to write the information of a total of 5 configuration files in a new configuration file vit-base-p32_1xb64_in1k-384px_animal.py, and then modify the contents.
  • First vit-base-p32_64xb64_in1k-384px.pyupdate the content in the configuration file to the inherited key-value pairs, such as modelin img_size=384, train_pipelineand test_pipelinealso need to be changed
  • Then change the num_classes, dataset_type, train_dataloader, val_dataloader, val_evaluator, lr, param_scheduler, default_hooks,randomness
  • It should be noted that it dataset_typeneeds to be changed to 'CustomDataset', and 'CustomDataset'there is no splitkey in , so delete the key in train_dataloader.val_dataloadersplit
  • Because the number of categories is small, less than 5 categories, so the in val_evaluatoris changed fromtopk(1, 5)5
  • lrTo scale proportionally with the original, the scaling rate batch( as known from the configuration file name, the original is 64 * 64)lr32/(64 * 64)64xb64batch_size
  • Because only 100 are trained epoch, the keys LinearLR schedulerin endare also scaled proportionally, that is, divided by 3. The , , corresponding changes CosineAnnealingLR schedulerin the ruleT_maxbeginend
  • Because the model may not have learning results in the first 20 epoch, there is no need for verification. Adding val_begina key here means that the index is calculated on the verification set from the 20th epoch, and the verification frequency does not need to be 1 epochonce. Here, it is changed to 5 epochverifications once
  • We want the model to automatically save the weight once every 10 epochs, and at most two training weights at the same time, and automatically keep the training weight with the highest accuracy according to the indicatorcheckpoint = dict(type='CheckpointHook', interval=10, max_keep_ckpts=2, save_best='auto')
  • The recording frequency 100(unit: iter) is a bit too low, we changed it to10
  • Finally fixed random number seedrandomness
custom_config = """
model = dict(
    type='ImageClassifier',		# 主模型类型(对于图像分类任务,使用 `ImageClassifier`)
    backbone=dict(
        type='VisionTransformer',	# 主干网络类型
        arch='b',
        img_size=384,		# 输入模型图像大小
        patch_size=32,		# patch数
        drop_rate=0.1,		# dropout率
        init_cfg=[			# 初始化参数方式
            dict(
                type='Kaiming',
                layer='Conv2d',
                mode='fan_in',
                nonlinearity='linear')
        ]),
    neck=None,
    head=dict(
        type='VisionTransformerClsHead',		# 分类颈网络类型
        num_classes=3,						# 分类数
        in_channels=768,						# 输入通道数
        loss=dict(type='CrossEntropyLoss', loss_weight=1.0),	# 损失函数配置信息
        topk=(1, 5),	# 评估指标,Top-k 准确率, 这里为 top1 与 top5 准确率
    ))

dataset_type = 'CustomDataset'		# 预处理配置
data_preprocessor = dict(
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    to_rgb=True,
)

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='RandomResizedCrop', scale=384, backend='pillow'),
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    dict(type='PackInputs'),
]

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='ResizeEdge', scale=384, edge='short', backend='pillow'),
    dict(type='CenterCrop', crop_size=384),
    dict(type='PackInputs'),
]

train_dataloader = dict(
    batch_size=64,		# 每张 GPU 的 batchsize
    num_workers=2,		# 每个 GPU 的线程数
    dataset=dict(		# 训练数据集
        type=dataset_type,
        data_root='./data/animal/train',
        pipeline=train_pipeline),
    sampler=dict(type='DefaultSampler', shuffle=True),	# 默认采样器
)

val_dataloader = dict(
    batch_size=64,
    num_workers=2,
    dataset=dict(
        type=dataset_type,
        data_root='./data/animal/val',
        pipeline=test_pipeline),
    sampler=dict(type='DefaultSampler', shuffle=False),
)

# 验证集评估设置,使用准确率为指标, 这里使用 topk1 以及 top5 准确率
val_evaluator = dict(type='Accuracy', topk=1)

test_dataloader = val_dataloader
test_evaluator = val_evaluator

optim_wrapper = dict(
    optimizer=dict(type='AdamW', lr=0.003 * 32 / (64 * 64), weight_decay=0.3),
    # vit预训练专用配置
    paramwise_cfg=dict(custom_keys={
        '.cls_token': dict(decay_mult=0.0),
        '.pos_embed': dict(decay_mult=0.0)
    }),
    clip_grad=dict(max_norm=1.0)
)

# 学习率策略
param_scheduler = [
    # 预热学习率调度器
    dict(
        type='LinearLR',
        start_factor=1e-4,
        by_epoch=True,
        begin=0,
        end=10,
        # 根据iter更新
        convert_to_iter_based=True),
    # 主要的学习策略
    dict(
        type='CosineAnnealingLR',
        T_max=90,
        by_epoch=True,
        begin=10,
        end=100,
    )
]

# train, val, test设置,max_epoch和验证频率
train_cfg = dict(by_epoch=True, max_epochs=100, val_begin=20, val_interval=5)
val_cfg = dict()
test_cfg = dict()

# 默认所有注册器使用的域
default_scope = 'mmpretrain'

# 配置默认的 hook
default_hooks = dict(
    # 记录每次迭代的时间
    timer=dict(type='IterTimerHook'),

    # 每 10 次迭代打印一次日志
    logger=dict(type='LoggerHook', interval=10),

    # 启用默认参数调度 hook
    param_scheduler=dict(type='ParamSchedulerHook'),

    checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=2, save_best='auto'),

    # 在分布式环境中设置采样器种子
    sampler_seed=dict(type='DistSamplerSeedHook'),

    # 验证结果可视化,默认不启用,设置 True 时启用
    visualization=dict(type='VisualizationHook', enable=False),
)

# 配置环境
env_cfg = dict(
    # 是否开启 cudnn benchmark
    cudnn_benchmark=False,

    # 设置多进程参数
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),

    # 设置分布式参数
    dist_cfg=dict(backend='nccl'),
)

# 设置可视化工具
vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(type='UniversalVisualizer', vis_backends=vis_backends)

# 设置日志级别
log_level = 'INFO'

# 从哪个检查点加载
load_from = None

# 是否从加载的检查点恢复训练
resume = False

# 默认随机数种子
randomness = dict(seed=2023, deterministic=False)
"""
# 写入vit-base-p32_1xb64_in1k-384px_pets.py文件中
animal_config=f'./configs/vision_transformer/vit-base-p32_1xb64_in1k-384px_pets.py'
with open(animal_config, 'w') as f:
    f.write(custom_config)
  • The second method is to read the default configuration file first, and then change it through the dictionary feature of python. The advantage is that only the parts that need to be changed are changed, and the logic is clear.
  • The disadvantage is that some intermediate variables in the configuration file are invalid. For example, only the definition can be used in the configuration file dataset_type, and it can be used directly later train_dataloader, val_dataloaderbut the dictionary feature needs to be changed twice
  • The parameter changes are the same as above, but the code is much less
# 读取配置文件
from mmengine import Config
cfg = Config.fromfile('./configs/vision_transformer/vit-base-p32_64xb64_in1k-384px.py')
max_epochs = 100
batch_size = 64
lr_scale_factor = batch_size/(64 * 64)
epoch_scale_factor = max_epochs/cfg.train_cfg.max_epochs

cfg.model.head.num_classes = 3

cfg.load_from = './checkpoints/vit-base-p32_in21k-pre-3rdparty_ft-64xb64_in1k-384_20210928-9cea8599.pth'
cfg.work_dir = './work_dir'

cfg.dataset_type = 'CustomDataset'

cfg.train_dataloader.batch_size = batch_size
cfg.train_dataloader.num_workers = 2
cfg.train_dataloader.dataset.type = cfg.dataset_type
cfg.train_dataloader.dataset.data_root = './data/animal/train'
del cfg.train_dataloader.dataset['split']

cfg.val_dataloader.batch_size = cfg.train_dataloader.batch_size
cfg.val_dataloader.num_workers = cfg.train_dataloader.num_workers
cfg.val_dataloader.dataset.data_root = './data/animal/valid'
cfg.val_dataloader.dataset.type = cfg.dataset_type
del cfg.val_dataloader.dataset['split']

cfg.test_dataloader = cfg.val_dataloader

cfg.val_evaluator = dict(type='Accuracy', topk=1)
cfg.test_evaluator = cfg.val_evaluator

cfg.optim_wrapper.optimizer.lr = cfg.optim_wrapper.optimizer.lr * lr_scale_factor

# LinearLR scheduler end epoch
cfg.param_scheduler[0].end = cfg.param_scheduler[0].end * epoch_scale_factor

# CosineAnnealingLR scheduler
cfg.param_scheduler[1].T_max = max_epochs - cfg.param_scheduler[0].end
cfg.param_scheduler[1].begin = cfg.param_scheduler[0].end
cfg.param_scheduler[1].end = max_epochs

cfg.train_cfg.max_epochs = max_epochs
cfg.train_cfg.val_begin = 20
cfg.train_cfg.val_interval = 5

cfg.default_hooks.checkpoint = dict(type='CheckpointHook', interval=10, max_keep_ckpts=2, save_best='auto')
cfg.default_hooks.logger.interval = 50

cfg.randomness.seed = 2023

#------------------------------------------------------
animal_config=f'./configs/vision_transformer/vit-base-p32_1xb64_in1k-384px_pets.py'
with open(animal_config, 'w') as f:
    f.write(cfg.pretty_text)

start training

!python tools/train.py {
    
    animal_config}
  • Since the output log is too long, not all of it will be shown here, but print the weight of the model with the highest accuracy
07/30 13:33:50 - mmengine - INFO - Epoch(val) [55][24/24]    accuracy/top1: 99.9333  data_time: 0.2443  time: 0.5068
  • It can be seen that the accuracy of the model on the verification set is 99.93%, which can be said to be very good

model reasoning

  • Load the model with the highest accuracy and perform inference on the image
import glob
ckpt_path = glob.glob('./work_dir/best_accuracy_top1*.pth')[0]
img_path = '/kaggle/input/animal-faces/afhq/train/cat/flickr_cat_000052.jpg'

inferencer = ImageClassificationInferencer(animal_config, pretrained=ckpt_path)

result = inferencer(img_path)
result

output:

[{
    
    'pred_scores': array([9.9998045e-01, 1.3512783e-05, 6.0256166e-06], dtype=float32),
  'pred_label': 0,
  'pred_score': 0.9999804496765137,
  'pred_class': 'cat'}]

plot confusion matrix

  • We can plot the confusion matrix to further check the model accuracy
python tools/analysis_tools/confusion_matrix.py \
    {
    
    animal_config} \
    {
    
    ckpt_path}\
    --show

Category Activation Map (CAM) Visualization

  • Use the category activation map (CAM) to interpret the classification image. For more parameter settings, please refer to the official document
!python tools/visualization/vis_cam.py \
        {
    
    img_path} \
        {
    
    animal_config} \
        {
    
    ckpt_path} \
        --method GradCAM \
        --save-path cam.jpg \
        --vit-like
display.clear_output()
from PIL import Image
Image.open('cam.jpg')

Please add a picture description

T-SNE visualization

  • Through dimensionality reduction visualization, you can further observe whether the model has a clear boundary between categories, and you can also find categories that the model is prone to misjudgment
python tools/visualization/vis_tsne.py \
     {
    
    animal_config}\
    --checkpoint {
    
    ckpt_path}

Guess you like

Origin blog.csdn.net/qq_20144897/article/details/132049251