foreword
MMPreTrain
is anPyTorch
open source deep learning pre-toolbox based on , and isOpenMMLab
a member of the projectMMPreTrain
The main features of are:- Supports diverse backbone networks and pre-trained models
- Support multiple training strategies (supervised learning, unsupervised learning, multimodal learning, etc.)
- Provides a variety of training techniques
- A large number of training configuration files
- High Efficiency and High Scalability
- Powerful toolbox to facilitate model analysis and experimentation
- Supports multiple inference tasks out of the box
- Image classification
- Picture description (picture description)
- Visual Question Answering
- Visual Grounding
- Search (picture search, picture search text, text search picture)
- This article mainly demonstrates how to use , fine-tune the model
MMPreTrain
for image classification tasksvision_transformer
- The classification data uses the Animal Faces dataset
kaggle
in the platform , and the operating environment iskaggle GPU P100
Environment installation
-
Because the interpretable analysis of the model is required, the installation
grad-cam
package is required,mmcv
and the installation method has been written in my previousmmdetection
andmmsegmentation
tutorials. no more mention here -
mmpretrain
The installation method is best to usegit
, if there is nogit
tool, you can usemim install mmpretrain
-
checkpoint
Finally, create new ,outputs
, anddata
folders under the project folder , which are used to store model pre-training weights, model output results, and training data respectively.
from IPython import display
!pip install "grad-cam>=1.3.6"
!pip install -U openmim
!pip install -q /kaggle/input/frozen-packages-mmdetection/mmcv-2.0.1-cp310-cp310-linux_x86_64.whl
!git clone https://github.com/open-mmlab/mmpretrain.git
%cd mmpretrain
!mim install -e .
!mkdir checkpoint
!mkdir outputs
!mkdir data
display.clear_output()
- After the above installation work is completed, let's check the environment and check the installation version
import mmcv
from mmcv.ops import get_compiling_cuda_version, get_compiler_version
import mmpretrain
print('MMCV版本', mmcv.__version__)
print('mmpretrain版本', mmpretrain.__version__)
print('CUDA版本', get_compiling_cuda_version())
print('编译器版本', get_compiler_version())
output:
MMCV版本 2.0.1
mmpretrain版本 1.0.0
CUDA版本 11.8
编译器版本 GCC 11.3
- Because
mmpretrain
it is suitable for many kinds of tasks, when fine-tuning the image classification model, first check which models support the task. You can call thelist_models
function to check, because we want to fine-tunevision_transformer
the model, we can add restrictionsvit
for more precise screening
from mmpretrain import list_models, inference_model
list_models(task='Image Classification',pattern='vit')
vit-base-p32_in21k-pre_3rdparty_in1k-384px
Here we take the models in the candidate list as an example
model reasoning
- Enter the project folder
configs/vision_transformer
to view the pre-training weights andconfig
files corresponding to the model model
- Download pre-trained weights and perform inference on example images
from mmpretrain import ImageClassificationInferencer
# 待输入图像路径
img_path = 'mmpretrain/demo/bird.JPEG'
model = 'vit-base-p32_in21k-pre_3rdparty_in1k-384px'
# 预训练权重
pretrained = './checkpoints/vit-base-p32_in21k-pre-3rdparty_ft-64xb64_in1k-384_20210928-9cea8599.pth'
# 推理预测
inferencer = ImageClassificationInferencer(model=model, pretrained=pretrained, device='cuda:0')
result = inferencer('demo/bird.JPEG', show_dir="./visualize/")
display.clear_output()
- View inference results
result[0].keys()
output :
dict_keys(['pred_scores', 'pred_label', 'pred_score', 'pred_class'])
- Print the name of the category with the highest classification confidence, along with the confidence
# 置信度最高类别的名称
print(result[0]['pred_class'])
# 置信度最高类别的置信度
print('{:.3f}'.format(result[0]['pred_score']))
house finch, linnet, Carpodacus mexicanus
0.985
Fine-tuning the model
- Move the data set
data
to the directory and prepare for training
# animal数据集移动
shutil.copytree('/kaggle/input/animal-faces/afhq', './data/animal')
Configuration file parsing
MMPreTrain
The configuration file is a bit different from the configuration file. When you openmmdetection
the configuration file , you will find that only the data pipeline and processing method are explicitly defined in the configuration file.mmsegmentation
vit-base-p32_in21k-pre_3rdparty_in1k-384px
vit-base-p32_64xb64_in1k-384px.py
- But in fact, the data processing and optimizer parameters are implicitly
_base_
defined . For details, please refer to the code comments below.
_base_ = [
'../_base_/models/vit-base-p32.py', # 模型配置
'../_base_/datasets/imagenet_bs64_pil_resize.py', # 数据配置
'../_base_/schedules/imagenet_bs4096_AdamW.py', # 训练策略配置
'../_base_/default_runtime.py' # 默认运行设置
]
# model setting
# 输入图像大小
model = dict(backbone=dict(img_size=384))
# dataset setting
# 输入的图片数据通道以 'RGB' 顺序
data_preprocessor = dict(
mean=[127.5, 127.5, 127.5], # 输入图像归一化的 RGB 通道均值
std=[127.5, 127.5, 127.5], # 输入图像归一化的 RGB 通道标准差
to_rgb=True, # 是否将通道翻转,从 BGR 转为 RGB 或者 RGB 转为 BGR
)
train_pipeline = [
dict(type='LoadImageFromFile'), # 读取图像
dict(type='RandomResizedCrop', scale=384, backend='pillow'), # 随机放缩裁剪
dict(type='RandomFlip', prob=0.5, direction='horizontal'), # 随机水平翻转
dict(type='PackInputs'), # 准备图像以及标签
]
test_pipeline = [
dict(type='LoadImageFromFile'), # 读取图像
dict(type='ResizeEdge', scale=384, edge='short', backend='pillow'), # 缩放短边尺寸至384px
dict(type='CenterCrop', crop_size=384), # 中心裁剪
dict(type='PackInputs'), # 准备图像以及标签
]
train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
val_dataloader = dict(dataset=dict(pipeline=test_pipeline))
test_dataloader = dict(dataset=dict(pipeline=test_pipeline))
# schedule设定
optim_wrapper = dict(clip_grad=dict(max_norm=1.0))
- Open
../_base_/models/vit-base-p32.py
the file to view the model configuration
model = dict(
type='ImageClassifier', # 主模型类型(对于图像分类任务,使用 `ImageClassifier`)
backbone=dict(
type='VisionTransformer', # 主干网络类型
arch='b',
img_size=224, # 输入模型图像大小
patch_size=32, # patch数
drop_rate=0.1, # dropout率
init_cfg=[ # 初始化参数方式
dict(
type='Kaiming',
layer='Conv2d',
mode='fan_in',
nonlinearity='linear')
]),
neck=None,
head=dict(
type='VisionTransformerClsHead', # 分类颈网络类型
num_classes=1000, # 分类数
in_channels=768, # 输入通道数
loss=dict(type='CrossEntropyLoss', loss_weight=1.0), # 损失函数配置信息
topk=(1, 5), # 评估指标,Top-k 准确率, 这里为 top1 与 top5 准确率
))
- Open
../_base_/datasets/imagenet_bs64_pil_resize.py
the file to view the data configuration
dataset_type = 'ImageNet' # 预处理配置
data_preprocessor = dict(
num_classes=1000,
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True,
)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='RandomResizedCrop', scale=224, backend='pillow'),
dict(type='RandomFlip', prob=0.5, direction='horizontal'),
dict(type='PackInputs'),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='ResizeEdge', scale=256, edge='short', backend='pillow'),
dict(type='CenterCrop', crop_size=224),
dict(type='PackInputs'),
]
train_dataloader = dict(
batch_size=64, # 每张 GPU 的 batchsize
num_workers=5, # 每个 GPU 的线程数
dataset=dict( # 训练数据集
type=dataset_type,
data_root='data/imagenet',
split='train',
pipeline=train_pipeline),
sampler=dict(type='DefaultSampler', shuffle=True), # 默认采样器
)
val_dataloader = dict(
batch_size=64,
num_workers=5,
dataset=dict(
type=dataset_type,
data_root='data/imagenet',
split='val',
pipeline=test_pipeline),
sampler=dict(type='DefaultSampler', shuffle=False),
)
# 验证集评估设置,使用准确率为指标, 这里使用 topk1 以及 top5 准确率
val_evaluator = dict(type='Accuracy', topk=(1, 5))
test_dataloader = val_dataloader
test_evaluator = val_evaluator
- Open
../_base_/schedules/imagenet_bs4096_AdamW.py
the file to view the training strategy configuration
optim_wrapper = dict(
optimizer=dict(type='AdamW', lr=0.003, weight_decay=0.3), # 使用AdamW优化器
# vit预训练专用配置
paramwise_cfg=dict(custom_keys={
'.cls_token': dict(decay_mult=0.0),
'.pos_embed': dict(decay_mult=0.0)
}),
)
# 学习率策略
param_scheduler = [
# 预热学习率调度器
dict(
type='LinearLR',
start_factor=1e-4,
by_epoch=True,
begin=0,
end=30,
# 根据iter更新
convert_to_iter_based=True),
# 主要的学习策略
dict(
type='CosineAnnealingLR',
T_max=270,
by_epoch=True,
begin=30,
end=300,
)
]
# train, val, test设置,max_epoch和验证频率
train_cfg = dict(by_epoch=True, max_epochs=300, val_interval=1)
val_cfg = dict()
test_cfg = dict()
auto_scale_lr = dict(base_batch_size=4096)
- Open
../_base_/default_runtime.py
the file to see the default run settings
# 默认所有注册器使用的域
default_scope = 'mmpretrain'
# 配置默认的 hook
default_hooks = dict(
# 记录每次迭代的时间
timer=dict(type='IterTimerHook'),
# 每 100 次迭代打印一次日志
logger=dict(type='LoggerHook', interval=100),
# 启用默认参数调度 hook
param_scheduler=dict(type='ParamSchedulerHook'),
# 每个 epoch 保存检查点
checkpoint=dict(type='CheckpointHook', interval=1),
# 在分布式环境中设置采样器种子
sampler_seed=dict(type='DistSamplerSeedHook'),
# 验证结果可视化,默认不启用,设置 True 时启用
visualization=dict(type='VisualizationHook', enable=False),
)
# 配置环境
env_cfg = dict(
# 是否开启 cudnn benchmark
cudnn_benchmark=False,
# 设置多进程参数
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
# 设置分布式参数
dist_cfg=dict(backend='nccl'),
)
# 设置可视化工具
vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(type='UniversalVisualizer', vis_backends=vis_backends)
# 设置日志级别
log_level = 'INFO'
# 从哪个检查点加载
load_from = None
# 是否从加载的检查点恢复训练
resume = False
# 默认随机数种子
randomness = dict(seed=None, deterministic=False)
Modify the configuration file
- According to the above instructions, here are two ways to modify the configuration file.
- The first is to write the information of a total of 5 configuration files in a new configuration file
vit-base-p32_1xb64_in1k-384px_animal.py
, and then modify the contents. - First
vit-base-p32_64xb64_in1k-384px.py
update the content in the configuration file to the inherited key-value pairs, such asmodel
inimg_size=384
,train_pipeline
andtest_pipeline
also need to be changed - Then change the
num_classes
,dataset_type
,train_dataloader
,val_dataloader
,val_evaluator
,lr
,param_scheduler
,default_hooks
,randomness
- It should be noted that it
dataset_type
needs to be changed to'CustomDataset'
, and'CustomDataset'
there is nosplit
key in , so delete the key intrain_dataloader
.val_dataloader
split
- Because the number of categories is small, less than 5 categories, so the in
val_evaluator
is changed fromtopk
(1, 5)
5
lr
To scale proportionally with the original, the scaling ratebatch
( as known from the configuration file name, the original is 64 * 64)lr
32/(64 * 64)
64xb64
batch_size
- Because only 100 are trained
epoch
, the keysLinearLR scheduler
inend
are also scaled proportionally, that is, divided by 3. The , , corresponding changesCosineAnnealingLR scheduler
in the ruleT_max
begin
end
- Because the model may not have learning results in the first 20
epoch
, there is no need for verification. Addingval_begin
a key here means that the index is calculated on the verification set from the 20thepoch
, and the verification frequency does not need to be 1epoch
once. Here, it is changed to 5epoch
verifications once - We want the model to automatically save the weight once every 10 epochs, and at most two training weights at the same time, and automatically keep the training weight with the highest accuracy according to the indicator
checkpoint = dict(type='CheckpointHook', interval=10, max_keep_ckpts=2, save_best='auto')
- The recording frequency
100
(unit:iter
) is a bit too low, we changed it to10
- Finally fixed random number seed
randomness
custom_config = """
model = dict(
type='ImageClassifier', # 主模型类型(对于图像分类任务,使用 `ImageClassifier`)
backbone=dict(
type='VisionTransformer', # 主干网络类型
arch='b',
img_size=384, # 输入模型图像大小
patch_size=32, # patch数
drop_rate=0.1, # dropout率
init_cfg=[ # 初始化参数方式
dict(
type='Kaiming',
layer='Conv2d',
mode='fan_in',
nonlinearity='linear')
]),
neck=None,
head=dict(
type='VisionTransformerClsHead', # 分类颈网络类型
num_classes=3, # 分类数
in_channels=768, # 输入通道数
loss=dict(type='CrossEntropyLoss', loss_weight=1.0), # 损失函数配置信息
topk=(1, 5), # 评估指标,Top-k 准确率, 这里为 top1 与 top5 准确率
))
dataset_type = 'CustomDataset' # 预处理配置
data_preprocessor = dict(
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True,
)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='RandomResizedCrop', scale=384, backend='pillow'),
dict(type='RandomFlip', prob=0.5, direction='horizontal'),
dict(type='PackInputs'),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='ResizeEdge', scale=384, edge='short', backend='pillow'),
dict(type='CenterCrop', crop_size=384),
dict(type='PackInputs'),
]
train_dataloader = dict(
batch_size=64, # 每张 GPU 的 batchsize
num_workers=2, # 每个 GPU 的线程数
dataset=dict( # 训练数据集
type=dataset_type,
data_root='./data/animal/train',
pipeline=train_pipeline),
sampler=dict(type='DefaultSampler', shuffle=True), # 默认采样器
)
val_dataloader = dict(
batch_size=64,
num_workers=2,
dataset=dict(
type=dataset_type,
data_root='./data/animal/val',
pipeline=test_pipeline),
sampler=dict(type='DefaultSampler', shuffle=False),
)
# 验证集评估设置,使用准确率为指标, 这里使用 topk1 以及 top5 准确率
val_evaluator = dict(type='Accuracy', topk=1)
test_dataloader = val_dataloader
test_evaluator = val_evaluator
optim_wrapper = dict(
optimizer=dict(type='AdamW', lr=0.003 * 32 / (64 * 64), weight_decay=0.3),
# vit预训练专用配置
paramwise_cfg=dict(custom_keys={
'.cls_token': dict(decay_mult=0.0),
'.pos_embed': dict(decay_mult=0.0)
}),
clip_grad=dict(max_norm=1.0)
)
# 学习率策略
param_scheduler = [
# 预热学习率调度器
dict(
type='LinearLR',
start_factor=1e-4,
by_epoch=True,
begin=0,
end=10,
# 根据iter更新
convert_to_iter_based=True),
# 主要的学习策略
dict(
type='CosineAnnealingLR',
T_max=90,
by_epoch=True,
begin=10,
end=100,
)
]
# train, val, test设置,max_epoch和验证频率
train_cfg = dict(by_epoch=True, max_epochs=100, val_begin=20, val_interval=5)
val_cfg = dict()
test_cfg = dict()
# 默认所有注册器使用的域
default_scope = 'mmpretrain'
# 配置默认的 hook
default_hooks = dict(
# 记录每次迭代的时间
timer=dict(type='IterTimerHook'),
# 每 10 次迭代打印一次日志
logger=dict(type='LoggerHook', interval=10),
# 启用默认参数调度 hook
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(type='CheckpointHook', interval=10, max_keep_ckpts=2, save_best='auto'),
# 在分布式环境中设置采样器种子
sampler_seed=dict(type='DistSamplerSeedHook'),
# 验证结果可视化,默认不启用,设置 True 时启用
visualization=dict(type='VisualizationHook', enable=False),
)
# 配置环境
env_cfg = dict(
# 是否开启 cudnn benchmark
cudnn_benchmark=False,
# 设置多进程参数
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
# 设置分布式参数
dist_cfg=dict(backend='nccl'),
)
# 设置可视化工具
vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(type='UniversalVisualizer', vis_backends=vis_backends)
# 设置日志级别
log_level = 'INFO'
# 从哪个检查点加载
load_from = None
# 是否从加载的检查点恢复训练
resume = False
# 默认随机数种子
randomness = dict(seed=2023, deterministic=False)
"""
# 写入vit-base-p32_1xb64_in1k-384px_pets.py文件中
animal_config=f'./configs/vision_transformer/vit-base-p32_1xb64_in1k-384px_pets.py'
with open(animal_config, 'w') as f:
f.write(custom_config)
- The second method is to read the default configuration file first, and then change it through the dictionary feature of python. The advantage is that only the parts that need to be changed are changed, and the logic is clear.
- The disadvantage is that some intermediate variables in the configuration file are invalid. For example, only the definition can be used in the configuration file
dataset_type
, and it can be used directly latertrain_dataloader
,val_dataloader
but the dictionary feature needs to be changed twice - The parameter changes are the same as above, but the code is much less
# 读取配置文件
from mmengine import Config
cfg = Config.fromfile('./configs/vision_transformer/vit-base-p32_64xb64_in1k-384px.py')
max_epochs = 100
batch_size = 64
lr_scale_factor = batch_size/(64 * 64)
epoch_scale_factor = max_epochs/cfg.train_cfg.max_epochs
cfg.model.head.num_classes = 3
cfg.load_from = './checkpoints/vit-base-p32_in21k-pre-3rdparty_ft-64xb64_in1k-384_20210928-9cea8599.pth'
cfg.work_dir = './work_dir'
cfg.dataset_type = 'CustomDataset'
cfg.train_dataloader.batch_size = batch_size
cfg.train_dataloader.num_workers = 2
cfg.train_dataloader.dataset.type = cfg.dataset_type
cfg.train_dataloader.dataset.data_root = './data/animal/train'
del cfg.train_dataloader.dataset['split']
cfg.val_dataloader.batch_size = cfg.train_dataloader.batch_size
cfg.val_dataloader.num_workers = cfg.train_dataloader.num_workers
cfg.val_dataloader.dataset.data_root = './data/animal/valid'
cfg.val_dataloader.dataset.type = cfg.dataset_type
del cfg.val_dataloader.dataset['split']
cfg.test_dataloader = cfg.val_dataloader
cfg.val_evaluator = dict(type='Accuracy', topk=1)
cfg.test_evaluator = cfg.val_evaluator
cfg.optim_wrapper.optimizer.lr = cfg.optim_wrapper.optimizer.lr * lr_scale_factor
# LinearLR scheduler end epoch
cfg.param_scheduler[0].end = cfg.param_scheduler[0].end * epoch_scale_factor
# CosineAnnealingLR scheduler
cfg.param_scheduler[1].T_max = max_epochs - cfg.param_scheduler[0].end
cfg.param_scheduler[1].begin = cfg.param_scheduler[0].end
cfg.param_scheduler[1].end = max_epochs
cfg.train_cfg.max_epochs = max_epochs
cfg.train_cfg.val_begin = 20
cfg.train_cfg.val_interval = 5
cfg.default_hooks.checkpoint = dict(type='CheckpointHook', interval=10, max_keep_ckpts=2, save_best='auto')
cfg.default_hooks.logger.interval = 50
cfg.randomness.seed = 2023
#------------------------------------------------------
animal_config=f'./configs/vision_transformer/vit-base-p32_1xb64_in1k-384px_pets.py'
with open(animal_config, 'w') as f:
f.write(cfg.pretty_text)
start training
!python tools/train.py {
animal_config}
- Since the output log is too long, not all of it will be shown here, but print the weight of the model with the highest accuracy
07/30 13:33:50 - mmengine - INFO - Epoch(val) [55][24/24] accuracy/top1: 99.9333 data_time: 0.2443 time: 0.5068
- It can be seen that the accuracy of the model on the verification set is 99.93%, which can be said to be very good
model reasoning
- Load the model with the highest accuracy and perform inference on the image
import glob
ckpt_path = glob.glob('./work_dir/best_accuracy_top1*.pth')[0]
img_path = '/kaggle/input/animal-faces/afhq/train/cat/flickr_cat_000052.jpg'
inferencer = ImageClassificationInferencer(animal_config, pretrained=ckpt_path)
result = inferencer(img_path)
result
output:
[{
'pred_scores': array([9.9998045e-01, 1.3512783e-05, 6.0256166e-06], dtype=float32),
'pred_label': 0,
'pred_score': 0.9999804496765137,
'pred_class': 'cat'}]
plot confusion matrix
- We can plot the confusion matrix to further check the model accuracy
python tools/analysis_tools/confusion_matrix.py \
{
animal_config} \
{
ckpt_path}\
--show
Category Activation Map (CAM) Visualization
- Use the category activation map (CAM) to interpret the classification image. For more parameter settings, please refer to the official document
!python tools/visualization/vis_cam.py \
{
img_path} \
{
animal_config} \
{
ckpt_path} \
--method GradCAM \
--save-path cam.jpg \
--vit-like
display.clear_output()
from PIL import Image
Image.open('cam.jpg')
T-SNE visualization
- Through dimensionality reduction visualization, you can further observe whether the model has a clear boundary between categories, and you can also find categories that the model is prone to misjudgment
python tools/visualization/vis_tsne.py \
{
animal_config}\
--checkpoint {
ckpt_path}