Open source pre-training framework MMPRETRAIN official documentation (overview, environment installation and verification, basic user guide)

MMPretrain is a newly upgraded open source pre-training framework. It has set out to provide multiple powerful pre-trained backbones and support different pre-training strategies. MMPretrain is derived from the well-known open source projects MMClassification and MMSelfSup and has developed many exciting new features. Currently, the pre-training phase is crucial for visual recognition. With rich and powerful pre-trained models, we currently have the ability to improve various downstream vision tasks.

The main goal of our codebase is to be an accessible and user-friendly library, and to simplify research and engineering. We detail the properties and design of the different parts of MMPretrain.

1. MMPretrain practice roadmap

insert image description here

To help users get up and running quickly with MMPretrain, we recommend following the practical roadmap we created for the library:

(1) For users who want to try MMPretrain, we recommend reading the getting started section to understand the environment setup.

(2) For basic use, we recommend users to refer to the user guide to utilize various algorithms to obtain pre-trained models and evaluate their performance in downstream tasks.

(3) For those who wish to customize their own algorithms, we provide an advanced guide that includes tips and rules for modifying the code.

(4) To find the pre-training model you want, users can check out ModelZoo, which summarizes various backbone and pre-training methods and the introduction of different algorithms.

(5) In addition, we provide analysis and visualization tools to help diagnose algorithms.

2. Environment configuration and installation

1. Prerequisites

In this section, we will demonstrate how to prepare an environment with PyTorch.
MMPretrain is available for Linux, Windows, and macOS. It requires Python 3.7+, CUDA 10.2+ and PyTorch 1.8+.

2. Installation

创建conda环境并激活。
conda create --name mmpretrain python=3.8 -y  #创建环境
conda activate mmpretrain   #激活环境            
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch  #安装 PyTorch and torchvision (官方)

#如果网不好,可以这样安装
pip3 install torch==1.8.2+cu102 torchvision==0.9.2+cu102 torchaudio===0.8.2 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html  -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com

#验证是否安装成功

>>> import torchvision
>>> import torch
>>> import.__version__
  File "<stdin>", line 1
    import.__version__
          ^
SyntaxError: invalid syntax
>>> torch.__version__
'1.8.2+cu102'

3. Install mmpretrain from source

In this case, install mmpretrain from source:

git clone https://github.com/open-mmlab/mmpretrain.git
cd mmpretrain
pip install -U openmim && mim install -e .  -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html  -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
git clone https://github.com/open-mmlab/mmpretrain.git
cd mmpretrain
pip install -U openmim  -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html  -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
mim install -e .

"-e" means to install the project in editable mode, so any local modifications made to the code will take effect without reinstalling.
insert image description here
Install multimodal support (optional)
Additional dependencies are required for multimodal models in MMPretrain. To install these dependencies, you can add [multimodal] during installation. For example:

# Install from source
mim install -e ".[multimodal]"

# Install as a Python package
mim install "mmpretrain[multimodal]>=1.0.0rc8"

3. Verify the installation

To verify that MMPretrain is installed correctly, we provide some sample code to run the inference demo.
Option (a). If you installed mmpretrain from source, just run the following command:

python demo/image_demo.py demo/demo.JPEG resnet18_8xb32_in1k --device cpu

You will see output dictionary in terminal, including pred_label, pred_score and pred_class
insert image description here
options (b). If you installed mmpretrain as a python package, open a python interpreter and copy and paste the following code.

from mmpretrain import get_model, inference_model

model = get_model('resnet18_8xb32_in1k', device='cpu')  # or device='cuda:0'
inference_model(model, 'demo/demo.JPEG')

You'll see a printed dictionary, including the predicted labels, scores, and class names.

Notice

Here resnet18_8xb32_in1k is the model name, you can use mmpretrain.list_models to explore all models, or search for them in the model zoo summary.

3. Understand the configuration

In order to manage various configurations in deep learning experiments, we use a configuration file to record all these configurations. The configuration system adopts modular and inheritance design, more detailed information can refer to the tutorial of MMEngine.

Usually, we use Python files as configuration files. All configuration files are placed under the configs folder, and the directory structure is as follows:

MMPretrain/
    ├── configs/
    │   ├── _base_/                       # primitive configuration folder,基本配置文件夹
    │   │   ├── datasets/                      # primitive datasets,基本数据集
    │   │   ├── models/                        # primitive models,基本模型
    │   │   ├── schedules/                     # primitive schedules,基本学习率
    │   │   └── default_runtime.py             # primitive runtime setting,基本运行设置
    │   ├── beit/                         # BEiT Algorithms Folder,BEiT
    │   ├── mae/                          # MAE Algorithms Folder,MAE 算法文件夹
    │   ├── mocov2/                       # MoCoV2 Algorithms Folder,MoCoV2算法文件夹
    │   ├── resnet/                       # ResNet Algorithms Folder
    │   ├── swin_transformer/             # Swin Algorithms Folder
    │   ├── vision_transformer/           # ViT Algorithms Folder
    │   ├── ...
    └── ...

If you want to inspect the configuration file, you can run to see the complete configuration.

python tools/misc/print_config.py /PATH/TO/CONFIG

This article mainly explains the structure of the configuration file and how to modify it based on the existing configuration file. Let's take the ResNet50 configuration file as an example and explain it line by line.

1. Configuration structure

There are four basic component files configs/ base in the folder , namely:
models
datasets
schedules
runtime

We refer to all configuration files _base_ in the folder as original configuration files. You can easily build training profiles by inheriting some original profiles.
For ease of understanding, we take the ResNet50 configuration file as an example and comment each line.

_base_ = [                                    # This config file will inherit all config files in `_base_`.
    '../_base_/models/resnet50.py',           # model settings
    '../_base_/datasets/imagenet_bs32.py',    # data settings
    '../_base_/schedules/imagenet_bs256.py',  # schedule settings
    '../_base_/default_runtime.py'            # runtime settings
]

1. Model settings

This original configuration file contains a dict variable model, which mainly contains information such as network structure and loss function :

type:

The type of model to build, we support a variety of tasks.
For image classification tasks, usually ImageClassifier can be found in the API documentation for more details.
For self-supervised learning, there are several SelfSupervisors, such as MoCoV2, BEiT, MAE, etc. You can find more details in the API documentation.
For image retrieval tasks, typically ImageToImageRetriever , more details can be found in the API documentation.

Usually, we use the type field to specify the class of the component, and use other fields to pass the initialization parameters of the class. The registry tutorial describes this in detail.

Here we take the config field ImageClassifier as an example to explain the initialization parameters as follows:

backbone:

Backbone settings. The backbone network is the main network for extracting input features, such as ResNet, Swin Transformer, Vision Transformer, etc. All available backbone networks can be found in the API documentation .
For self-supervised learning, some backbones are reimplemented, you can find more details in the API documentation.

neck:

Neck setting. The neck is an intermediate module connecting the backbone and the head, just like GlobalAveragePooling. All available necks can be found in the API documentation.

head:

Task header settings. A head is a task-related component that performs a specified task, such as image classification or self-supervised training. All available headers can be found in the API documentation.

loss:

The loss function to be optimized, such as CrossEntropyLoss, LabelSmoothLoss, etc. All available losses can be found in the API documentation PixelReconstructionLoss.

data_preprocessor:

Component for model forward processing, used to preprocess the input. See the documentation for more details .

train_cfg ImageClassifier

Additional settings during training. In ImageClassifier, we mainly use it to specify batch augmentation settings such as Mixup and CutMix. . See the documentation for more details

The following is the model original configuration of the ResNet50 configuration file configs/ base /models/resnet50.py:

model = dict(
    type='ImageClassifier',     # The type of the main model (here is for image classification task).主要模型的类型(这里是针对图像分类任务的)。
    backbone=dict(
        type='ResNet',          # The type of the backbone module.
        # All fields except `type` come from the __init__ method of class `ResNet`
        # and you can find them from https://mmpretrain.readthedocs.io/en/latest/api/generated/mmpretrain.models.backbones.ResNet.html,#除了' type '之外的所有字段都来自' ResNet '类的__init__方法,你可以从
        depth=50,
        num_stages=4,
        out_indices=(3, ),
        frozen_stages=-1,
        style='pytorch'),
    neck=dict(type='GlobalAveragePooling'),    # The type of the neck module.
    head=dict(
        type='LinearClsHead',     # The type of the classification head module.
        # All fields except `type` come from the __init__ method of class `LinearClsHead`除了' type '之外的所有字段都来自' linearclhead '类的__init__方法
        # and you can find them from https://mmpretrain.readthedocs.io/en/latest/api/generated/mmpretrain.models.heads.LinearClsHead.html
        num_classes=1000,
        in_channels=2048,
        loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
    ))
CLASSmmpretrain.models.backbones.ResNet(depth, in_channels=3, stem_channels=64, base_channels=64, expansion=None, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3,), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=-1, conv_cfg=None, norm_cfg={
    
    'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, init_cfg=[{
    
    'type': 'Kaiming', 'layer': ['Conv2d']}, {
    
    'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}], drop_path_rate=0.0)
CLASSmmpretrain.models.heads.LinearClsHead(num_classes, in_channels, init_cfg={
    
    'layer': 'Linear', 'std': 0.01, 'type': 'Normal'}, **kwargs)

2. Data setting

This raw configuration file contains information for building data loaders and evaluators:

data_preprocessor:

Model input preprocessing configuration, same as model.data_preprocessor but with lower priority.

train_evaluator | val_evaluator | test_evaluator:

To build an estimator or metric, see the tutorial.

train_dataloader | val_dataloader | test_dataloader:

Data Loader Settings

batch_size: batch size per GPU.
num_workers: The number of workers for each GPU to fetch data.
sampler: The setting of the sampler.
persistent_workers: Whether to continue working after completing an epoch.
dataset: The settings of the dataset.

####### type: The type of dataset, we support CustomDataset, ImageNet has many other datasets, refer to the documentation.

####### pipeline: data conversion pipeline. You can learn how to design piping in this tutorial.

The following is the data primitive configuration configs/ base /datasets/imagenet_bs32.py in the ResNet50 configuration:

dataset_type = 'ImageNet',训练样本的数据类型
# preprocessing configuration
data_preprocessor = dict(
    # Input image data channels in 'RGB' order
    mean=[123.675, 116.28, 103.53],    # Input image normalized channel mean in RGB order
    std=[58.395, 57.12, 57.375],       # Input image normalized channel std in RGB order
    to_rgb=True,                       # Whether to flip the channel from BGR to RGB or RGB to BGR
)

train_pipeline = [
    dict(type='LoadImageFromFile'),     # read image
    dict(type='RandomResizedCrop', scale=224),     # Random scaling and cropping,随机缩放和裁剪
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),   # random horizontal flip
    dict(type='PackInputs'),         # prepare images and labels,准备图片和标签
]

test_pipeline = [
    dict(type='LoadImageFromFile'),     # read image
    dict(type='ResizeEdge', scale=256, edge='short'),  # Scale the short side to 256,将短边缩放到256
    dict(type='CenterCrop', crop_size=224),     # center crop
    dict(type='PackInputs'),                 # prepare images and labels
]

# Construct training set dataloader
train_dataloader = dict(
    batch_size=32,                     # batchsize per GPU
    num_workers=5,                     # Number of workers to fetch data per GPU,每个GPU获取数据的工作线程数
    dataset=dict(                      # training dataset
        type=dataset_type,
        data_root='data/imagenet',
        ann_file='meta/train.txt',
        data_prefix='train',
        pipeline=train_pipeline),
    sampler=dict(type='DefaultSampler', shuffle=True),   # default sampler,默认的取样器
    persistent_workers=True,                             # Whether to keep the process, can shorten the preparation time of each epoch,是否保留流程,可以缩短每个epoch的准备时间
)

# Construct the validation set dataloader
val_dataloader = dict(
    batch_size=32,
    num_workers=5,
    dataset=dict(
        type=dataset_type,
        data_root='data/imagenet',
        ann_file='meta/val.txt',
        data_prefix='val',
        pipeline=test_pipeline),
    sampler=dict(type='DefaultSampler', shuffle=False),
    persistent_workers=True,
)
# The settings of the evaluation metrics for validation. We use the top1 and top5 accuracy here.用于验证的评估度量的设置。我们在这里使用top1和top5的精度
val_evaluator = dict(type='Accuracy', topk=(1, 5))

test_dataloader = val_dataloader  # The settings of the dataloader for the test dataset, which is the same as val_dataloader
test_evaluator = val_evaluator    # The settings of the evaluation metrics for test, which is the same as val_evaluator

3. Schedule setting

This raw config file mainly contains the training strategy settings and the settings for the training, validation and testing loops :

optim_wrapper: Settings for the optimizer wrapper. We use optimizer wrappers to customize the optimization process .
optimizer: supports all pytorch optimizers, refer to related MMEngine documents.
paramwise_cfg: Set different optimization parameters according to parameter types or names, please refer to relevant learning strategy documents.
accumulative_counts: Optimize parameters after many back-offs instead of one. You can use it to simulate large batches with small batches.
param_scheduler: Optimizer parameter strategy.

You can use this to specify the learning rate and momentum curves during training. See the documentation in MMEngine for more details .

train_cfg | val_cfg | test_cfg: For settings of training, validation and test cycles, please refer to the relevant MMEngine documentation.

The following is the schedule primitive configuration configs/ base /datasets/imagenet_bs32.py in the ResNet50 configuration:

optim_wrapper = dict(
    # Use SGD optimizer to optimize parameters.
    optimizer=dict(type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001))

# The tuning strategy of the learning rate.学习率的调整策略
# The 'MultiStepLR' means to use multiple steps policy to schedule the learning rate (LR).“MultiStepLR”是指使用多步策略来调度学习率
param_scheduler = dict(
    type='MultiStepLR', by_epoch=True, milestones=[30, 60, 90], gamma=0.1)

# Training configuration, iterate 100 epochs, and perform validation after every training epoch.
# 'by_epoch=True' means to use `EpochBaseTrainLoop`, 'by_epoch=False' means to use IterBaseTrainLoop.
train_cfg = dict(by_epoch=True, max_epochs=100, val_interval=1)
# Use the default val loop settings.
val_cfg = dict()
# Use the default test loop settings.
test_cfg = dict()

# This schedule is for the total batch size 256.此计划适用于总批大小为256的批
# If you use a different total batch size, like 512 and enable auto learning rate scaling.如果您使用不同的总批大小,如512,并启用自动学习率缩放。
# We will scale up the learning rate to 2 times.我们将把学习率提高到2倍。
auto_scale_lr = dict(base_batch_size=256)

4. Runtime settings

This part mainly includes saving checkpoint strategy, log configuration, training parameters, breakpoint weight path, working directory, etc.
This is the runtime raw configuration file "configs/base/default_runtime.py" used by almost all configurations:

# defaults to use registries in mmpretrain,默认在mmpretrain中使用注册表
default_scope = 'mmpretrain'

# configure default hooks
default_hooks = dict(
    # record the time of every iteration.,记录每次迭代的时间
    timer=dict(type='IterTimerHook'),

    # print log every 100 iterations.
    logger=dict(type='LoggerHook', interval=100),

    # enable the parameter scheduler.启用参数调度程序
    param_scheduler=dict(type='ParamSchedulerHook'),

    # save checkpoint per epoch.
    checkpoint=dict(type='CheckpointHook', interval=1),

    # set sampler seed in a distributed environment.在分布式环境中设置采样器种子
    sampler_seed=dict(type='DistSamplerSeedHook'),

    # validation results visualization, set True to enable it.
    visualization=dict(type='VisualizationHook', enable=False),
)

# configure environment
env_cfg = dict(
    # whether to enable cudnn benchmark,是否开启cudnn基准
    cudnn_benchmark=False,

    # set multi-process parameters,设置多进程参数
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),

    # set distributed parameters,设置分布参数
    dist_cfg=dict(backend='nccl'),
)

# set visualizer
vis_backends = [dict(type='LocalVisBackend')]  # use local HDD backend
visualizer = dict(
    type='UniversalVisualizer', vis_backends=vis_backends, name='visualizer')

# set log level
log_level = 'INFO'

# load from which checkpoint
load_from = None

# whether to resume training from the loaded checkpoint
resume = False

2. Inherit and modify configuration files

For ease of understanding, we recommend that contributors inherit from existing configuration files. But don't abuse inheritance rights. In general, we recommend a maximum inheritance level of 3 for all configuration files.

For example, if your config file is based on ResNet with some other modifications, you can first inherit the basic ResNet structure, data ensemble and other training settings, then modify the configuration file. A more concrete example, now we want to use almost all the configuration in configs/resnet/resnet50_8xb32_in1k.py, but use CutMix training batch augmentation and change the number of training epochs from 100 to 300, modify when to decay the learning rate, and To modify the dataset path , you can create a new configs/resnet/resnet50_8xb32-300e_in1k.py configuration file with the following content:

# create this file under 'configs/resnet/' folder
_base_ = './resnet50_8xb32_in1k.py'

# using CutMix batch augment
model = dict(
    train_cfg=dict(
        augments=dict(type='CutMix', alpha=1.0)#增加了数据增强
    )
)

# trains more epochs
train_cfg = dict(max_epochs=300, val_interval=10)  # Train for 300 epochs, evaluate every 10 epochs
param_scheduler = dict(step=[150, 200, 250])   # The learning rate adjustment has also changed

# Use your own dataset directory
train_dataloader = dict(
    dataset=dict(data_root='mydata/imagenet/train'),
)
val_dataloader = dict(
    batch_size=64,                  # No back-propagation during validation, larger batch size can be used
    dataset=dict(data_root='mydata/imagenet/val'),
)
test_dataloader = dict(
    batch_size=64,                  # No back-propagation during test, larger batch size can be used
    dataset=dict(data_root='mydata/imagenet/val'),
)

3. Modify the configuration in the command

When you use the script "tools/train.py" or "tools/test.py" to submit tasks or use some other tools, they can directly modify the content of the configuration file used by specifying the parameter --cfg-options.

Update the config key for the dictionary chain.

Configuration options can be specified in the order of the dictionary keys in the original configuration. For example, change all BN blocks in the model trunk to schema.

--cfg-options model.backbone.norm_eval=False

Update the key in the configuration list.

A number of configuration dictionaries form a list in your configuration. For example, a training pipeline data.train.pipeline is usually a list, eg. You can specify if you want to change to in the pipeline.

[dict(type='LoadImageFromFile'), dict(type='TopDownRandomFlip', flip_prob=0.5), ...]

Update the value of a list/tuple.

If the value to update is a list or tuple. For example, configuration files are usually set. If you want to change this field, you can specify it. Note that quotes " are required to support list/tuple datatypes, and no spaces are allowed within quotes of specified values.

val_evaluator = dict(type='Accuracy', topk=(1, 5))
--cfg-options val_evaluator.topk="(1,3)"

4. Reasoning of existing models

This tutorial will show how to use the following APIs:

list_models: Lists the model names available in MMPreTrain.
get_model: Get a model from model name or model configuration.
inference_model: Inference the model with the corresponding inferencer.
This is a shortcut to get started quickly, for advanced usage use the reasoner directly below.

Inferencer:
ImageClassificationInferencer: Performs image classification on a given image.
ImageRetrievalInferencer: Performs image-to-image retrieval from a given image on a given image set.
ImageCaptionInferencer: Generates a caption on a given image.
VisualQuestionAnsweringInferencer: Answers questions based on a given picture.
VisualGroundingInferencer: Finds an object from the description of a given image.
TextToImageRetrievalInferencer: Performs text-to-image retrieval based on a given description for a given set of images.
ImageToTextRetrievalInferencer: Performs image-to-text retrieval on a sequence of text from a given image.
NLVRInferencer: Performs natural language visual inference on a given image pair and text.
FeatureExtractor: Extract features from image files via a visual backbone.

list available models

List all models in MMPreTrain.

from mmpretrain import list_models
list_models()

list_models supports Unix filename pattern matching, you can use *** * ** to match any character.

from mmpretrain import list_models
list_models("*convnext-b*21k")

You can use the list_models method of the reasoner to get the available models for the corresponding task.

from mmpretrain import ImageCaptionInferencer
ImageCaptionInferencer.list_models()

Get a model

you can use get_model get the model.

from mmpretrain import get_model

model = get_model("convnext-base_in21k-pre_3rdparty_in1k")

model = get_model("convnext-base_in21k-pre_3rdparty_in1k", pretrained=True)

model = get_model("convnext-base_in21k-pre_3rdparty_in1k", pretrained="your_local_checkpoint_path")

model = get_model("convnext-base_in21k-pre_3rdparty_in1k", head=dict(num_classes=10))

model_headless = get_model("resnet18_8xb32_in1k", head=None, neck=None, backbone=dict(out_indices=(1, 2, 3)))
import torch
from mmpretrain import get_model
model = get_model('convnext-base_in21k-pre_3rdparty_in1k', pretrained=True)
x = torch.rand((1, 3, 224, 224))
y = model(x)
print(type(y), y.shape)

Reasoning about a given image

Below is an example of inferring images by the ResNet-50 pretrained classification model.

from mmpretrain import inference_model
image = 'https://github.com/open-mmlab/mmpretrain/raw/main/demo/demo.JPEG'
# If you have no graphical interface, please set `show=False`
result = inference_model('resnet50_8xb32_in1k', image, show=True)
print(result['pred_class'])

The inference_model API is for demonstration only and cannot persist model instances or perform inference on multiple samples. You can use the reasoner to make multiple calls.

from mmpretrain import ImageClassificationInferencer
image = 'https://github.com/open-mmlab/mmpretrain/raw/main/demo/demo.JPEG'
inferencer = ImageClassificationInferencer('resnet50_8xb32_in1k')
# Note that the inferencer output is a list of result even if the input is a single sample.
result = inferencer('https://github.com/open-mmlab/mmpretrain/raw/main/demo/demo.JPEG')[0]
print(result['pred_class'])
# You can also use is for multiple images.
image_list = ['demo/demo.JPEG', 'demo/bird.JPEG'] * 16
results = inferencer(image_list, batch_size=8)
print(len(results))
print(results[1]['pred_class'])

Typically, the result for each sample is a dictionary. For example, the image classification result is a dictionary containing pred_label, pred_score, and pred_scores are as follows pred_class

{
    
    
    "pred_label": 65,
    "pred_score": 0.6649366617202759,
    "pred_class":"sea snake",
    "pred_scores": array([..., 0.6649366617202759, ...], dtype=float32)
}

You can configure the inferencer via parameters, for example, to infer images via CUDA using your own configuration files and checkpoints.

from mmpretrain import ImageClassificationInferencer
image = 'https://github.com/open-mmlab/mmpretrain/raw/main/demo/demo.JPEG'
config = 'configs/resnet/resnet50_8xb32_in1k.py'
checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'
inferencer = ImageClassificationInferencer(model=config, pretrained=checkpoint, device='cuda')
result = inferencer(image)[0]
print(result['pred_class'])

Inference with Gradio Demo

We also provide a Gradio demo for all supported tasks, which you can find in projects/gradio_demo/launch.py​​.
Please install gradio first. pip install -U gradio

pip install "gradio>=3.31.0"

insert image description here

Extract features from images

FeatureExtractor, compared to model.extract_feat, extracts features directly from image files instead of using a batch of tensors. In summary, the input of model.extract_feat is torch.Tensor, and the input of FeatureExtractor is image.

from mmpretrain import FeatureExtractor, get_model
model = get_model('resnet50_8xb32_in1k', backbone=dict(out_indices=(0, 1, 2, 3)))
extractor = FeatureExtractor(model)
features = extractor('https://github.com/open-mmlab/mmpretrain/raw/main/demo/demo.JPEG')[0]
features[0].shape, features[1].shape, features[2].shape, features[3].shape

5. Model training

In this tutorial, we will describe how to use the script provided in MMPretrain to start a training job. If you want, we also provide some practice examples on how to pre-train with custom datasets and how to fine-tune with custom datasets.

Use your computer for training

You can use tools/train.py which trains the model on a single machine with CPU and optional GPU.

Here is the full usage of the script:

python tools/train.py ${CONFIG_FILE} [ARGS]

By default, MMPretrain prefers GPU over CPU. If you want to train the model on the CPU, please clear or set CUDA_VISIBLE_DEVICES to -1 to make the GPU invisible to the program.

CUDA_VISIBLE_DEVICES=-1 python tools/train.py ${CONFIG_FILE} [ARGS]

CONFIG_FILE Path to configuration file.

--work-dir WORK_DIR Destination folder for saving logs and checkpoints. The default is the folder ./work_dirs with the same name as the configuration file under .config.

--resume [RESUME] Resume training. If a path is specified, restore from that path, if not specified, attempt to automatically restore from the latest checkpoint.

--amp enables automatic mixed precision training.

--no-validate Not recommended. Disable checkpoint evaluation during training.

--auto-scale-lr automatically scales the learning rate based on the actual batch size and the original batch size.

--no-pin-memory Whether to disable the pin_memory option in the data loader.

--no-persistent-workers Whether to disable the persistent_workers option in the data loader.

–cfg-options CFG_OPTIONS Override some settings in the used configuration, the key-value pairs in the format of xxx=yyy will be merged into the configuration file. If the value to override is a list, it should be of the form key="[a,b]" or key=a,b. This parameter also allows nested list/tuple values, such as key="[(a,b),(c,d)]". Note that quotes are required and no spaces are allowed.

--launcher {none,pytorch,slurm,mpi}, options for the job launcher.

Use multiple GPUs for training

We provide a shell script to launch multi-GPU tasks torch.distributed.launch.

bash ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [PY_ARGS]

You can also specify additional parameters for the launcher via environment variables. For example, change the initiator's communication port to 29666 with the following command:

PORT=29666 bash ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [PY_ARGS]

If you want to start multiple training jobs and use different GPUs, you can start them by specifying different ports and visible devices.

CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash ./tools/dist_train.sh ${CONFIG_FILE1} 4 [PY_ARGS]
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 bash ./tools/dist_train.sh ${CONFIG_FILE2} 4 [PY_ARGS]

How to use a custom dataset for pre-training

In this tutorial, we provide a practice example and some tips on how to train on your own dataset.
In MMPretrain, we support CustomDataset (similar to ImageFolderin torchvision), which can directly read the images in the specified folder. You only need to prepare the path information of the custom dataset and edit the config .

Step 1: Prepare the dataset

Follow Preparing Datasets to prepare the dataset. The root folder of the dataset can be like this data/custom_dataset/.
Here, we assume that you want to do unsupervised training, and use the subfolder format CustomDataset to organize the dataset as:
data/custom_dataset/
├── sample1.png
├── sample2.png
├── sample3.png
├─ ─ sample4.png
└── …

Step 2: Select a configuration as a template

Here, we want to take this configs/mae/mae_vit-base-p16_8xb512-amp-coslr-300e_in1k.py as an example. We first copy this configuration file to the same folder and rename it to mae_vit-base-p16_8xb512-amp-coslr-300e_custom.py.
The content of this configuration is:

_base_ = [
    '../_base_/models/mae_vit-base-p16.py',
    '../_base_/datasets/imagenet_bs512_mae.py',
    '../_base_/default_runtime.py',
]

# optimizer wrapper
optim_wrapper = dict(
    type='AmpOptimWrapper',
    loss_scale='dynamic',
    optimizer=dict(
        type='AdamW',
        lr=1.5e-4 * 4096 / 256,
        betas=(0.9, 0.95),
        weight_decay=0.05),
    paramwise_cfg=dict(
        custom_keys={
    
    
            'ln': dict(decay_mult=0.0),
            'bias': dict(decay_mult=0.0),
            'pos_embed': dict(decay_mult=0.),
            'mask_token': dict(decay_mult=0.),
            'cls_token': dict(decay_mult=0.)
        }))

# learning rate scheduler
param_scheduler = [
    dict(
        type='LinearLR',
        start_factor=0.0001,
        by_epoch=True,
        begin=0,
        end=40,
        convert_to_iter_based=True),
    dict(
        type='CosineAnnealingLR',
        T_max=260,
        by_epoch=True,
        begin=40,
        end=300,
        convert_to_iter_based=True)
]

# runtime settings
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=300)
default_hooks = dict(
    # only keeps the latest 3 checkpoints
    checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3))

randomness = dict(seed=0, diff_rank_seed=True)

# auto resume
resume = True

# NOTE: `auto_scale_lr` is for automatically scaling LR
# based on the actual training batch size.
auto_scale_lr = dict(base_batch_size=4096)
Step 3: Edit data set related configuration

Override the type dataset setting to 'CustomDataset'
data_root override the dataset setting to data/custom_dataset.
Override the dataset setting ann_file to an empty string as we assume you are using the subfolder format CustomDataset.
Set dataset override data_prefix to an empty string, because we use the whole dataset under data_root, and you don't need to split the samples into different subsets and set data_prefix. The modified configuration will be like
:

_base_ = [
    '../_base_/models/mae_vit-base-p16.py',
    '../_base_/datasets/imagenet_bs512_mae.py',
    '../_base_/default_runtime.py',
]

# >>>>>>>>>>>>>>> Override dataset settings here >>>>>>>>>>>>>>>>>>>
train_dataloader = dict(
    dataset=dict(
        type='CustomDataset',
        data_root='data/custom_dataset/',
        ann_file='',       # We assume you are using the sub-folder format without ann_file,我们假设您使用的是没有ann_file的子文件夹格式
        data_prefix='',    # The `data_root` is the data_prefix directly.
        with_label=False,
    )
)
# <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

# optimizer wrapper
optim_wrapper = dict(
    type='AmpOptimWrapper',
    loss_scale='dynamic',
    optimizer=dict(
        type='AdamW',
        lr=1.5e-4 * 4096 / 256,
        betas=(0.9, 0.95),
        weight_decay=0.05),
    paramwise_cfg=dict(
        custom_keys={
    
    
            'ln': dict(decay_mult=0.0),
            'bias': dict(decay_mult=0.0),
            'pos_embed': dict(decay_mult=0.),
            'mask_token': dict(decay_mult=0.),
            'cls_token': dict(decay_mult=0.)
        }))

# learning rate scheduler
param_scheduler = [
    dict(
        type='LinearLR',
        start_factor=0.0001,
        by_epoch=True,
        begin=0,
        end=40,
        convert_to_iter_based=True),
    dict(
        type='CosineAnnealingLR',
        T_max=260,
        by_epoch=True,
        begin=40,
        end=300,
        convert_to_iter_based=True)
]

# runtime settings
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=300)
default_hooks = dict(
    # only keeps the latest 3 checkpoints
    checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3))

randomness = dict(seed=0, diff_rank_seed=True)

# auto resume
resume = True

# NOTE: `auto_scale_lr` is for automatically scaling LR
# based on the actual training batch size.
auto_scale_lr = dict(base_batch_size=4096)

By using the edited configuration file, you can train a self-supervised model using the MAE algorithm on a custom dataset.

Following the above ideas, we also provide an example of how to train MAE on the COCO dataset. The edited file will look like this

_base_ = [
    '../_base_/models/mae_vit-base-p16.py',
    '../_base_/datasets/imagenet_mae.py',
    '../_base_/default_runtime.py',
]

# >>>>>>>>>>>>>>> Override dataset settings here >>>>>>>>>>>>>>>>>>>
train_dataloader = dict(
    dataset=dict(
        type='mmdet.CocoDataset',
        data_root='data/coco/',
        ann_file='annotations/instances_train2017.json',  # Only for loading images, and the labels won't be used.
        data_prefix=dict(img='train2017/'),
    )
)
# <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

# optimizer wrapper
optim_wrapper = dict(
    type='AmpOptimWrapper',
    loss_scale='dynamic',
    optimizer=dict(
        type='AdamW',
        lr=1.5e-4 * 4096 / 256,
        betas=(0.9, 0.95),
        weight_decay=0.05),
    paramwise_cfg=dict(
        custom_keys={
    
    
            'ln': dict(decay_mult=0.0),
            'bias': dict(decay_mult=0.0),
            'pos_embed': dict(decay_mult=0.),
            'mask_token': dict(decay_mult=0.),
            'cls_token': dict(decay_mult=0.)
        }))

# learning rate scheduler
param_scheduler = [
    dict(
        type='LinearLR',
        start_factor=0.0001,
        by_epoch=True,
        begin=0,
        end=40,
        convert_to_iter_based=True),
    dict(
        type='CosineAnnealingLR',
        T_max=260,
        by_epoch=True,
        begin=40,
        end=300,
        convert_to_iter_based=True)
]

# runtime settings
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=300)
default_hooks = dict(
    # only keeps the latest 3 checkpoints
    checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=3))

randomness = dict(seed=0, diff_rank_seed=True)

# auto resume
resume = True

# NOTE: `auto_scale_lr` is for automatically scaling LR
# based on the actual training batch size.
auto_scale_lr = dict(base_batch_size=4096)

How to use a custom dataset for pre-training

Step 1: Prepare the dataset

Follow Preparing Datasets to prepare the dataset. The root folder of the dataset can be like this data/custom_dataset/.
Here, we assume that you want to do supervised image classification training , and use the subfolder format CustomDataset to organize the dataset as
data/custom_dataset/
├── train
│ ├── class_x
│ │ ├── x_1.png
│ │ ├ ── x_2.png
│ │ ├── x_3.png
│ │ └── …
│ ├── class_y
│ └── …
└── test
├── class_x
│ ├── test_x_1.png
│ ├── test_x_2. png
│ ├── test_x_3.png
│ └── …
├── class_y
└── …

Step 2: Select a configuration as a template

Here, we want to take this configs/resnet/resnet50_8xb32_in1k.py as an example. We first copy this configuration file to the same folder and rename it resnet50_8xb32-ft_custom.py.

_base_ = [
    '../_base_/models/resnet50.py',           # model settings
    '../_base_/datasets/imagenet_bs32.py',    # data settings
    '../_base_/schedules/imagenet_bs256.py',  # schedule settings
    '../_base_/default_runtime.py',           # runtime settings
]
Step 3: Edit Model Settings

When fine-tuning a model, usually we want to load the pre-trained backbone weights and train a new classification head from scratch .

To load the pretrained backbone, we need to change the initialization configuration of the backbone and use the Pretrained initialization function. Also, in init_cfg, we use prefix='backbone' to tell the initialization function the prefix of the submodules that need to be loaded in the checkpoint.

For example, backbone here means loading the backbone submodule. Here we use online checkpoint which is downloaded automatically during training, you can also download the model manually and use a local path. Then we need to modify the header according to the class number of the new dataset, just change the num_classes header .

When the new dataset is small and shares a domain with the pre-trained dataset, we may wish to freeze the parameters of the first few stages of the backbone , which will help the network maintain the ability to extract low-level information from the pre-trained model. In MMPretrain, **you can simply specify the number of stages to freeze via the parameter frozen_stages. ** For example, to freeze the parameters of the first two stages, just use the following configuration:
frozen_stages Not all backbones support this argument so far. Please check the documentation to see if your backbone supports it.

_base_ = [
    '../_base_/models/resnet50.py',           # model settings
    '../_base_/datasets/imagenet_bs32.py',    # data settings
    '../_base_/schedules/imagenet_bs256.py',  # schedule settings
    '../_base_/default_runtime.py',           # runtime settings
]

# >>>>>>>>>>>>>>> Override model settings here >>>>>>>>>>>>>>>>>>>
model = dict(
    backbone=dict(
        frozen_stages=2,
        init_cfg=dict(
            type='Pretrained',
            checkpoint='https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth',
            prefix='backbone',
        )),
    head=dict(num_classes=10),
)
# <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Here we only need to set the part of the configuration we want to modify, because the inherited configuration will be merged and get the whole configuration.

Step 4: Edit Dataset Settings

In order to fine-tune a new dataset, we need to override some dataset settings such as dataset type, data pipeline, etc.

_base_ = [
    '../_base_/models/resnet50.py',           # model settings
    '../_base_/datasets/imagenet_bs32.py',    # data settings
    '../_base_/schedules/imagenet_bs256.py',  # schedule settings
    '../_base_/default_runtime.py',           # runtime settings
]

# model settings
model = dict(
    backbone=dict(
        frozen_stages=2,
        init_cfg=dict(
            type='Pretrained',
            checkpoint='https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth',
            prefix='backbone',
        )),
    head=dict(num_classes=10),
)

# >>>>>>>>>>>>>>> Override data settings here >>>>>>>>>>>>>>>>>>>
data_root = 'data/custom_dataset'
train_dataloader = dict(
    dataset=dict(
        type='CustomDataset',
        data_root=data_root,
        ann_file='',       # We assume you are using the sub-folder format without ann_file
        data_prefix='train',
    ))
val_dataloader = dict(
    dataset=dict(
        type='CustomDataset',
        data_root=data_root,
        ann_file='',       # We assume you are using the sub-folder format without ann_file
        data_prefix='test',
    ))
test_dataloader = val_dataloader
# <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Step 5: Edit plan settings (optional)

Fine-tuning hyperparameters differs from the default plan. It usually requires smaller learning rates and faster decay scheduler epochs.

_base_ = [
    '../_base_/models/resnet50.py',           # model settings
    '../_base_/datasets/imagenet_bs32.py',    # data settings
    '../_base_/schedules/imagenet_bs256.py',  # schedule settings
    '../_base_/default_runtime.py',           # runtime settings
]

# model settings
model = dict(
    backbone=dict(
        frozen_stages=2,
        init_cfg=dict(
            type='Pretrained',
            checkpoint='https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth',
            prefix='backbone',
        )),
    head=dict(num_classes=10),
)

# data settings
data_root = 'data/custom_dataset'
train_dataloader = dict(
    dataset=dict(
        type='CustomDataset',
        data_root=data_root,
        ann_file='',       # We assume you are using the sub-folder format without ann_file
        data_prefix='train',
    ))
val_dataloader = dict(
    dataset=dict(
        type='CustomDataset',
        data_root=data_root,
        ann_file='',       # We assume you are using the sub-folder format without ann_file
        data_prefix='test',
    ))
test_dataloader = val_dataloader

# >>>>>>>>>>>>>>> Override schedule settings here >>>>>>>>>>>>>>>>>>>
# optimizer hyper-parameters
optim_wrapper = dict(
    optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001))
# learning policy
param_scheduler = dict(
    type='MultiStepLR', by_epoch=True, milestones=[15], gamma=0.1)
# <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Step 6 start training

Now, we have completed the fine-tuning of the configuration file, as follows:

_base_ = [
    '../_base_/models/resnet50.py',           # model settings
    '../_base_/datasets/imagenet_bs32.py',    # data settings
    '../_base_/schedules/imagenet_bs256.py',  # schedule settings
    '../_base_/default_runtime.py',           # runtime settings
]

# model settings
model = dict(
    backbone=dict(
        frozen_stages=2,
        init_cfg=dict(
            type='Pretrained',
            checkpoint='https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth',
            prefix='backbone',
        )),
    head=dict(num_classes=10),
)

# data settings
data_root = 'data/custom_dataset'
train_dataloader = dict(
    dataset=dict(
        type='CustomDataset',
        data_root=data_root,
        ann_file='',       # We assume you are using the sub-folder format without ann_file
        data_prefix='train',
    ))
val_dataloader = dict(
    dataset=dict(
        type='CustomDataset',
        data_root=data_root,
        ann_file='',       # We assume you are using the sub-folder format without ann_file
        data_prefix='test',
    ))
test_dataloader = val_dataloader

# schedule settings
optim_wrapper = dict(
    optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001))
param_scheduler = dict(
    type='MultiStepLR', by_epoch=True, milestones=[15], gamma=0.1)

Here we use 8 GPUs on the computer to train the model with the following command:

bash tools/dist_train.sh configs/resnet/resnet50_8xb32-ft_custom.py 8

Additionally, you can train the model using only one GPU with the following command:

python tools/train.py configs/resnet/resnet50_8xb32-ft_custom.py

But wait, if you use one GPU, you need to change one important configuration. We need to change the dataset configuration as follows:

data_root = 'data/custom_dataset'
train_dataloader = dict(
    batch_size=256,
    dataset=dict(
        type='CustomDataset',
        data_root=data_root,
        ann_file='',       # We assume you are using the sub-folder format without ann_file
        data_prefix='train',
    ))
val_dataloader = dict(
    dataset=dict(
        type='CustomDataset',
        data_root=data_root,
        ann_file='',       # We assume you are using the sub-folder format without ann_file
        data_prefix='test',
    ))
test_dataloader = val_dataloader

This is because our training plan has a batch size of 256. If using 8 GPUs, just use the batch_size=32 configuration in the base configuration file for each GPU, and the total batch size will be 256. But if you use 1 GPU, you need to change it manually to 256 to match the training plan.

However, larger batch sizes require more GPU memory, here are a few simple tricks to save GPU memory:
Enable automatic mixed precision training.

python tools/train.py configs/resnet/resnet50_8xb32-ft_custom.py --amp

Use a smaller batch size (e.g. batch_size=32 instead of 256), and enable automatic learning rate scaling.

python tools/train.py configs/resnet/resnet50_8xb32-ft_custom.py --auto-scale-lr

auto_scale_lr.base_batch_size auto learning rate scaling will adjust the learning rate based on the actual batch size and (you can find it in base config configs/ base /schedules/imagenet_bs256.py)

6. Model testing

For image classification and image retrieval tasks, you can test your model after training.

Use your computer to test

You can use tools/test.py to test the model on a single computer with CPU and optional GPU.
Here is the full usage of the script:

python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]

By default, MMPretrain prefers GPU over CPU. If you want to test the model on the CPU, please clear or set CUDA_VISIBLE_DEVICES to -1 to make the GPU invisible to the program.

CUDA_VISIBLE_DEVICES=-1 python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]

CONFIG_FILE Path to configuration file.

CHECKPOINT_FILE path to checkpoint file (can be http link, you can find checkpoint here).
--work-dir WORK_DIR Directory to save files containing evaluation metrics.
--out OUT Path to save file containing test results.
--out-item OUT_ITEM Specifies the content of the test result file, which can be "pred" or "metrics". If 'pred', save the output of the model for offline evaluation. If 'metrics', save the evaluation metrics. Defaults to "pred".
–cfg-options CFG_OPTIONS Override some settings in the used configuration, the key-value pairs in the format of xxx=yyy will be merged into the configuration file. If the value to override is a list, it should be of the form key="[a,b]" or key=a,b. This parameter also allows nested list/tuple values, such as key="[(a,b),(c,d)]". Note that quotes are required and no spaces are allowed.

--show-dir SHOW_DIR Directory to save result visualization images.
--show visualizes the prediction results in a window.

--interval INTERVAL Sample interval to visualize.

--wait-time WAIT_TIME Display time for each window in seconds. The default is 1.
--no-pin-memory Whether to disable the pin_memory option in the data loader.

–tta Whether to enable (TTA). If the configuration file has tta_pipeline and tta_model fields, use them to determine the TTA transformation and how to combine the TTA results. Otherwise, flipped TTA is used by averaging the classification scores.
--launcher {none,pytorch,slurm,mpi}, options for the job launcher.

Test with multiple GPUs

bash ./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [PY_ARGS]

7. Downstream tasks

detection

For detection tasks, use MMDetection. First, make sure you have installed MIM, which is also a project of OpenMMLab.

pip install openmim
mim install 'mmdet>=3.0.0rc0'

Once installed
, you can run MMDetection with a simple command.

# distributed version
bash tools/benchmarks/mmdetection/mim_dist_train_c4.sh ${CONFIG} ${PRETRAIN} ${GPUS}
bash tools/benchmarks/mmdetection/mim_dist_train_fpn.sh ${CONFIG} ${PRETRAIN} ${GPUS}

# slurm version
bash tools/benchmarks/mmdetection/mim_slurm_train_c4.sh ${PARTITION} ${CONFIG} ${PRETRAIN}
bash tools/benchmarks/mmdetection/mim_slurm_train_fpn.sh ${PARTITION} ${CONFIG} ${PRETRAIN}

${CONFIG}: Directly use the configuration file path in MMDetection. For some algorithms, we also have some modified configuration files, which can be found in the benchmarks folder under the corresponding algorithm folder. You can also write configuration files from scratch.
${PRETRAIN}: pre-trained model file.
${GPUS}: The number of GPUs you want to use for training. We use 8 GPUs by default to perform detection tasks.

bash ./tools/benchmarks/mmdetection/mim_dist_train_c4.sh \
  configs/byol/benchmarks/mask-rcnn_r50-c4_ms-1x_coco.py \
  https://download.openmmlab.com/mmselfsup/1.x/byol/byol_resnet50_16xb256-coslr-200e_in1k/byol_resnet50_16xb256-coslr-200e_in1k_20220825-de817331.pth 8

After training
, you can also run the following command to test your model.

# distributed version
bash tools/benchmarks/mmdetection/mim_dist_test.sh ${CONFIG} ${CHECKPOINT} ${GPUS}

# slurm version
bash tools/benchmarks/mmdetection/mim_slurm_test.sh ${PARTITION} ${CONFIG} ${CHECKPOINT}

${CONFIG}: Directly use the configuration file name in MMDetection. For some algorithms, we also have some modified configuration files, which can be found in the benchmarks folder under the corresponding algorithm folder. You can also write configuration files from scratch.

${CHECKPOINT}: The fine-tuned detection model you want to test.

${GPUS}: The number of GPUs you want to use for testing. We use 8 GPUs by default to perform detection tasks.

bash ./tools/benchmarks/mmdetection/mim_dist_test.sh \
configs/byol/benchmarks/mask-rcnn_r50_fpn_ms-1x_coco.py \
https://download.openmmlab.com/mmselfsup/1.x/byol/byol_resnet50_16xb256-coslr-200e_in1k/byol_resnet50_16xb256-coslr-200e_in1k_20220825-de817331.pth 8

Split

For the semantic segmentation task, we use MMSegmentation. First, make sure you have installed MIM, which is also a project of OpenMMLab.

pip install openmim
mim install 'mmsegmentation>=1.0.0rc0'

After model training
is installed, you can run MMSegmentation with a simple command.

# distributed version
bash tools/benchmarks/mmsegmentation/mim_dist_train.sh ${CONFIG} ${PRETRAIN} ${GPUS}

# slurm version
bash tools/benchmarks/mmsegmentation/mim_slurm_train.sh ${PARTITION} ${CONFIG} ${PRETRAIN}

${CONFIG}: Directly use the configuration file path in MMSegmentation. For some algorithms, we also have some modified configuration files, which can be found in the benchmarks folder under the corresponding algorithm folder. You can also write configuration files from scratch.
${PRETRAIN}: pre-trained model file.
${GPUS}: The number of GPUs you want to use for training. We default to 4 GPUs for segmentation tasks.

bash ./tools/benchmarks/mmsegmentation/mim_dist_train.sh \
configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_4xb4-20k_voc12aug-512x512.py \
https://download.openmmlab.com/mmselfsup/1.x/byol/byol_resnet50_16xb256-coslr-200e_in1k/byol_resnet50_16xb256-coslr-200e_in1k_20220825-de817331.pth 4

Model Testing
After training, you can also run the following command to test your model.

# distributed version
bash tools/benchmarks/mmsegmentation/mim_dist_test.sh ${CONFIG} ${CHECKPOINT} ${GPUS}

# slurm version
bash tools/benchmarks/mmsegmentation/mim_slurm_test.sh ${PARTITION} ${CONFIG} ${CHECKPOINT}

${CONFIG}: Directly use the configuration file name in MMSegmentation. For some algorithms, we also have some modified configuration files, which can be found in the benchmarks folder under the corresponding algorithm folder. You can also write configuration files from scratch.

${CHECKPOINT}: The fine-tuned segmentation model you want to test.

${GPUS}: The number of GPUs you want to use for testing. We default to 4 GPUs for segmentation tasks.

bash ./tools/benchmarks/mmsegmentation/mim_dist_test.sh  fcn_r50-d8_4xb4-20k_voc12aug-512x512.py \
https://download.openmmlab.com/mmselfsup/1.x/byol/byol_resnet50_16xb256-coslr-200e_in1k/byol_resnet50_16xb256-coslr-200e_in1k_20220825-de817331.pth 4

Guess you like

Origin blog.csdn.net/qq_41627642/article/details/131561433