Open Source Pre-training Framework MMPRETRAIN Official Documentation (Advanced Guide)

1. Prepare the data set

1. Custom data set (the following are custom data set preparations for classified data)

CustomDataset is a generic dataset class for you to use your own datasets. To use CustomDataset, you need to organize dataset files according to the following two formats:

1. Subfolder format

In this format, you only need to reorganize the dataset folders and put all samples in one folder without creating any annotation files.

1. For supervision tasks (with_label=True),

We use the name of the subfolder as the class name , as shown in the example below, class_x, class_y are recognized as class names.
data_prefix/
├── class_x
│ ├── xxx.png
│ ├── xxy.png
│ └── …
│ └── xxz.png
└── class_y
├── 123.png
├── nsdf3.png
├─ ─ …
└── asd932_.png

insert image description here
insert image description here

2. For unsupervised tasks (with_label=False),

We directly load all sample files under the specified folder:
data_prefix/
├── folder_1
│ ├── xxx.png
│ ├── xxy.png
│ └── …
├── 123.png
├── nsdf3.png
└ ── ...
Assuming you want to use this as a training dataset, here is the configuration in your config file.

train_dataloader = dict(
    ...
    # Training dataset configurations
    dataset=dict(
        type='CustomDataset',
        data_prefix='path/to/data_prefix',
        with_label=True,   # or False for unsupervised tasks
        pipeline=...
    )
)

If you want to use this format, do not specify ann_file, or specify ann_file=''.

2. Text annotation file format

In this format, we use text annotation files to store image file paths and corresponding category indexes.

1. For supervision tasks (with_label=True),

Annotation files should contain the file path and the category index of a sample on a single line, separated by spaces, as follows:
All these file paths can be absolute or relative to the data_prefix.

folder_1/xxx.png 0
folder_1/xxy.png 1
123.png 4
nsdf3.png 3
...

The index number of the category starts from 0. And the value of the ground truth label should fall in the range [0, num_classes - 1].
Also, use the classes field in the dataset setting to specify the name of each class.

2. For unsupervised tasks (with_label=False),

The annotation file only needs one line to contain the file path of a sample, as follows:
folder_1/xxx.png
folder_1/xxy.png
123.png
nsdf3.png
...
assuming the entire dataset folder is as follows:
data_root
├── meta
│ ├── test. txt # The annotation file for the test dataset
│ ├── train.txt # The annotation file for the training dataset
│ └── val.txt # The annotation file for the validation dataset.
├── train
│ ├── 123. png
│ ├── folder_1
│ │ ├── xxx.png
│ │ └── xxy.png
│ └── nsdf3.png
├── test
└── val
The following is an example of dataset settings in the configuration file:

# Training dataloader configurations
train_dataloader = dict(
    dataset=dict(
        type='CustomDataset',
        data_root='path/to/data_root',  # The common prefix of both `ann_flie` and `data_prefix`.
        ann_file='meta/train.txt',      # The path of annotation file relative to the data_root.
        data_prefix='train',            # The prefix of file paths in the `ann_file`, relative to the data_root.
        with_label=True,                # or False for unsupervised tasks
        classes=['A', 'B', 'C', 'D', ...],  # The name of every category.
        pipeline=...,    # The transformations to process the dataset samples.
    )
    ...
)

2. The directory structure of the imageNet dataset

We support two ways of organizing ImageNet datasets: a subfolder format and a text annotation file format.

Subfolder Format
We provide a sample that you can download and extract from this link. The directory structure of the dataset should look like this:
data/imagenet/
├── train/
│ ├── n01440764
│ │
├── n01440764_10026.JPEG │ │
├── n01440764_10027.JPEG │ │ ├── n01440764_10029.JPEG
│ │ ... ├──
n01440764 ├── _ _ ILSVRC2012_val_00000293.JPEG │ │ ├── ILSVRC2012_val_00002138.JPEG │ │ ├── ILSVRC2012_val_00003014.JPEG │ │ └── … │ ├── …









Text Annotation File Format
You can download and extract the metadata from this link. And reorganize the dataset as follows:
data/imagenet/
├── meta/
│ ├── train.txt
│ ├── test.txt
│ └── val.txt
├── train/
│ ├── n01440764 ...
764_10042.JPEG
│ │
├──
n01440764_10043.JPEG
│ └── n01440764_10048.JPEG │ ├── … ├── val/ │ ├── ILSVRC2012_val_00000001.JPEG │ ├── ILSVRC2012_val_00000002.JPEG │ ├── ILSVRC2012_val_00000003. JPEG │ ├── ILSVRC2012_val_00000004. JPEG │ ├── …








Configuration
After organizing the dataset as above, you can use ImageNet dataset with the following configuration:

train_dataloader = dict(
    ...
    # Training dataset configurations
    dataset=dict(
        type='ImageNet',
        data_root='data/imagenet',
        split='train',
        pipeline=...,
    )
)

val_dataloader = dict(
    ...
    # Validation dataset configurations
    dataset=dict(
        type='ImageNet',
        data_root='data/imagenet',
        split='val',
        pipeline=...,
    )
)

test_dataloader = val_dataloader

3. Other supported data details

To find more datasets supported by MMPretrain, and for more configurations for the above datasets, see

2. Custom data pipeline

1. Data pipeline design

In the new dataset tutorial, we know that the dataset class uses the load_data_list method to initialize the entire dataset and save the information of each sample into a dictionary.

Usually, to save memory usage, we only load the image paths and tags in load_data_list, and load the full image content when used . Also, we might want to do some random data augmentation when picking samples at training time. Almost all data loading, preprocessing and formatting operations are configurable in MMPretrain via data pipelines.

The data pipeline means how to process the sample dictionary when indexing samples from the dataset. It consists of a series of data transformations. Each data transformation takes a dictionary as input, processes it, and outputs a dictionary for the next data transformation.

Below is an example data pipeline for training ResNet-50 on ImageNet.

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='RandomResizedCrop', scale=224),
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    dict(type='PackInputs'),
]

2. All available data transformations in MMPretrain can be found in the data transformation documentation

In MMPreTrain, data processing and datasets are decomposed. Datasets only define basic information on how to fetch samples from the file system. These basic information include ground truth labels and raw image data/image paths. Data processing includes data transformation, data preprocessing, and batch augmentation.

Data Transforms: Transformation includes loading, preprocessing, formatting, etc. Data
Preprocessors: Processes include collation, normalization, stacking, channel flipping, etc. Batch
Augmentations: Batch augmentation involves multiple samples, such as Mixup and CutMix.

1. Data Transforms, data conversion

To prepare the input data, we need to do some transformations on this basic information. These transformations include loading, preprocessing, and formatting. A series of data transformations make up a data pipeline. Therefore, you can find a pipeline parameter in the configuration of the dataset, for example:

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='RandomResizedCrop', scale=224),
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    dict(type='PackInputs'),
]

train_dataloader = dict(
    ....
    dataset=dict(
        pipeline=train_pipeline,
        ....),
    ....
)

Each item in the pipeline list is one of the following data transformation classes. If you want to add custom data transformation classes, the tutorial Custom Data Pipelines will help you.

Loading and Formatting, Loading and Formatting
Processing and Augmentation, Processing and Enhancement
MMCVtransforms, MMCV Transform
Transform Wrapper, Transform Wrapper TorchVision
Transforms, TorchVision Transform

1. Loading and Formatting

insert image description here

LoadImageFromFile load image from file
PackInputs pack input data
PackMultiTaskInputs Convert all image labels of a multi-task dataset into a dictionary of tensors
PILToNumpy Convert img to numpy.ndarray.
NumpyToPIL Convert image from OpenCV format to PIL.Image.Image…
Transpose Transpose a numpy array...
Collect Collect and keep only specified fields

2. Processing and Augmentation (processing and enhancement)

insert image description here

Albumentations Using enhanced wrappers from the albumentations library.
CenterCrop Crop the image's center, segmentation mask, bounding box, and keypoints.
ColorJitter Randomly vary the brightness, contrast, and saturation of an image.
EfficientNetCenterCrop EfficientNet-style center cropping.
EfficientNetRandomCrop EfficientNet-style RandomResizedCrop.
Lighting Image illumination adjustment using AlexNet-style PCA dithering.
Normalize Normalize images.
RandomCrop Crop the given image at a random position.
RandomErasing Randomly select a rectangular area in the image and erase pixels.
RandomFlip Flip images, bboxes, keypoints and segmentation maps.
RandomGrayscale Randomly converts the image to grayscale with probability.
RandomResize Randomly resize images, bboxes and keys.
RandomResizedCrop Crop the given image to a random ratio and aspect ratio.
Resize Crop the given image to a random ratio and aspect ratio.
ResizeEdge Resizes the image along the specified edges.
BEiTMaskGenerator Generate an image mask.
SimMIMMaskGenerator Generate a random block mask for each image.

3. Processing and Augmentation (processing and enhancement) combination enhancement

Combinatorial augmentation is a method that combines a series of data augmentation transformations, such as AutoAugment and RandAugment.

AutoAugment, automatic enhancement.
RandAugment, random enhancement.

The above transformation consists of a set of policies that are randomly transformed as follows:

AutoContrast Automatically adjust image contrast.
Brightness Adjust image brightness.
ColorTransform Adjust the image color balance.
Contrast Adjust image contrast.
Cutout Cut the image.
Equalize Equalizes the image histogram.
GaussianBlur Gaussian blur image.
Invert Inverts the image.
Posterize Posterize the image (reduce the number of bits per color channel).
Rotate Rotate the image.
Sharpness Adjust image sharpness.
Sharpness Adjust image sharpness.
Shear Cut the image.
Solarize Expose the image (invert all pixel values ​​above the threshold).
SolarizeAdd SolarizeAdd image (adds a specific value to pixels below a threshold).
Translate Translate images.
They are at AugTransform Base class for augmentation transformations for RandAugment.

4. MMCV transformation

We also provide a number of transformations in MMCV. You can use them directly in configuration files. The following are some commonly used transformations, the entire transformation list can be found in mmcv.transforms.

Transform wrapper for MultiView

A transform wrapper for multiple views of an image.

5. TorchVision Transformation

We also provide all transformations in TorchVision. You can use them like the following example:

1. 使用一些由 NumpyToPIL 和 PILToNumpy 包围的 TorchVision Augs(推荐)

添加 TorchVision Augs,周围是和dict(type=‘NumpyToPIL’, to_rgb=True),dict(type=‘PILToNumpy’, to_bgr=True),

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='NumpyToPIL', to_rgb=True),     # from BGR in cv2 to RGB  in PIL
    dict(type='torchvision/RandomResizedCrop',size=176),
    dict(type='PILToNumpy', to_bgr=True),     # from RGB  in PIL to BGR in cv2
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    dict(type='PackInputs'),
]

data_preprocessor = dict(
    num_classes=1000,
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    to_rgb=True,                          # from BGR in cv2 to RGB  in PIL
)
2.使用TorchVision Augs和ToTensor&Normalize

确保“img”在由 TorchVision Augs 处理之前已从 BGR-Numpy 格式转换为 PIL 格式。

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='NumpyToPIL', to_rgb=True),       # from BGR in cv2 to RGB  in PIL
    dict(
        type='torchvision/RandomResizedCrop',
        size=176,
        interpolation='bilinear'),            # accept str format interpolation mode
    dict(type='torchvision/RandomHorizontalFlip', p=0.5),
    dict(
        type='torchvision/TrivialAugmentWide',
        interpolation='bilinear'),
    dict(type='torchvision/PILToTensor'),
    dict(type='torchvision/ConvertImageDtype', dtype=torch.float),
    dict(
        type='torchvision/Normalize',
        mean=(0.485, 0.456, 0.406),
        std=(0.229, 0.224, 0.225),
    ),
    dict(type='torchvision/RandomErasing', p=0.1),
    dict(type='PackInputs'),
]

data_preprocessor = dict(num_classes=1000, mean=None, std=None, to_rgb=False)  # Normalize in dataset pipeline
3.使用TorchVision Augs,除了ToTensor&Normalize
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='NumpyToPIL', to_rgb=True),   # from BGR in cv2 to RGB  in PIL
    dict(type='torchvision/RandomResizedCrop', size=176, interpolation='bilinear'),
    dict(type='torchvision/RandomHorizontalFlip', p=0.5),
    dict(type='torchvision/TrivialAugmentWide', interpolation='bilinear'),
    dict(type='PackInputs'),
]

# here the Normalize params is for the RGB format
data_preprocessor = dict(
    num_classes=1000,
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    to_rgb=False,
)

2、数据预处理器

数据预处理器也是在将数据输入神经网络之前处理数据的组件。与数据转换相比,数据预处理器是分类器的一个模块,它需要一批数据来处理,这意味着它可以使用GPU和批处理来加速处理。

MMPreTrain 中的默认数据预处理器可以进行如下预处理:

将数据移动到目标设备。

将输入填充到当前批次的最大大小。

将输入堆叠到批次中。

如果输入的形状为 (3, H, W),则将输入从 bgr 转换为 rgb。

使用定义的标准差和平均值对图像进行标准化。

在训练期间进行 Mixup 和 CutMix 等批量增强。

您可以通过配置文件中的data_preprocessor字段或字段来配置数据预处理器。model.data_preprocessor典型用法如下:

data_preprocessor = dict(
    # RGB format normalization parameters
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    to_rgb=True,    # convert image from BGR to RGB
)

或者定义model.data_preprocessor如下:

model = dict(
    backbone = ...,
    neck = ...,
    head = ...,
    data_preprocessor = dict(
                         mean=[123.675, 116.28, 103.53],
                         std=[58.395, 57.12, 57.375],
                         to_rgb=True)
    train_cfg=...,
)

请注意, 的model.data_preprocessor优先级高于data_preprocessor。
ClsDataPreprocessor

用于分类任务的图像预处理器。

SelfSupDataPreprocessor

用于操作的图像预处理器,例如标准化和 bgr 到 rgb。

TwoNormDataPreprocessor

CAE、BEiT v1/v2 等图像预处理器

VideoDataPreprocessor

用于操作的视频预处理器,例如标准化和 bgr 到 rgb 转换。

批量增强

批量增强是数据预处理器的一个组件。它涉及多个样本并以某种方式混合它们,例如 Mixup 和 CutMix。
这些增强通常仅在训练期间使用,因此,我们使用该model.train_cfg字段在配置文件中配置它们。

model = dict(
    backbone=...,
    neck=...,
    head=...,
    train_cfg=dict(augments=[
        dict(type='Mixup', alpha=0.8),
        dict(type='CutMix', alpha=1.0),
    ]),
)

您还可以通过字段指定每个批次增强的概率probs。

model = dict(
    backbone=...,
    neck=...,
    head=...,
    train_cfg=dict(augments=[
        dict(type='Mixup', alpha=0.8),
        dict(type='CutMix', alpha=1.0),
    ], probs=[0.3, 0.7])
)

以下是可在 MMPreTrain 中使用的批量增强的列表。

Mixup 混合批量增强。
CutMix CutMix 批量添加。
ResizeMix ResizeMix 随机粘贴一批数据的图层

3、修改训练/测试管道

MMPretrain 中的数据管道非常灵活。您可以从配置文件中控制数据预处理的几乎每一步,但另一方面,面对如此多的选项,您可能会感到困惑。
这是图像分类任务的常见实践和指南。

1、加载

在数据管道的开始,我们通常需要从文件路径加载图像数据。 LoadImageFromFile通常用于完成此任务。

train_pipeline = [
    dict(type='LoadImageFromFile'),
    ...
]

如果要从特殊格式或特殊位置的文件加载数据,可以实现新的加载转换并将其添加到数据管道的开头。

2、增强和其他处理

在训练过程中,我们通常需要进行数据增强以避免过度拟合。在测试过程中,我们还需要进行一些数据处理,例如调整大小和裁剪。这些数据转换将在加载过程之后进行。

这是一个简单的数据增强配方示例。它将随机调整并裁剪输入图像到指定的比例大小,并以概率随机水平翻转图像。

train_pipeline = [
    ...
    dict(type='RandomResizedCrop', scale=224),
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    ...
]

这是Swin-Transformer训练中使用的大量数据增强配方示例 。为了与官方实现保持一致,它指定pillow为调整大小后端和bicubic 调整大小算法。此外,它还添加了RandAugment和 RandomErasing作为额外的数据增强方法。

此配置指定了数据增强的每个细节,您只需将其复制到您自己的配置文件即可应用 Swin-Transformer 的数据增强。

bgr_mean = [103.53, 116.28, 123.675]
bgr_std = [57.375, 57.12, 58.395]

train_pipeline = [
    ...
    dict(type='RandomResizedCrop', scale=224, backend='pillow', interpolation='bicubic'),
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    dict(
        type='RandAugment',
        policies='timm_increasing',
        num_policies=2,
        total_level=10,
        magnitude_level=9,
        magnitude_std=0.5,
        hparams=dict(
            pad_val=[round(x) for x in bgr_mean], interpolation='bicubic')),
    dict(
        type='RandomErasing',
        erase_prob=0.25,
        mode='rand',
        min_area_ratio=0.02,
        max_area_ratio=1 / 3,
        fill_color=bgr_mean,
        fill_std=bgr_std),
    ...
]

通常,数据管道中的数据增强部分仅处理图像级变换,而不处理像图像归一化或混合/剪切之类的变换。这是因为我们可以对批量数据进行图像归一化和混合/剪切混合来加速。要配置图像标准化和混合/剪切混合,请使用数据预处理器。

3、格式化

格式化就是从数据信息字典中收集训练数据,并将这些数据转换为模型友好的格式。

大多数情况下,您可以简单地使用PackInputs,它会将 NumPy 数组格式的图像转换为 PyTorch 张量,并将真实类别信息和其他元信息打包为DataSample.

train_pipeline = [
    ...
    dict(type='PackInputs'),
]

4、添加新的数据转换

在任意文件(例如 )中写入新的数据转换my_transform.py,并将其放置在文件夹 中mmpretrain/datasets/transforms/。数据转换类需要继承该类mmcv.transforms.BaseTransform并重写transform以字典作为输入并返回字典的方法。

from mmcv.transforms import BaseTransform
from mmpretrain.registry import TRANSFORMS

@TRANSFORMS.register_module()
class MyTransform(BaseTransform):

    def transform(self, results):
        # Modify the data information dict `results`.
        return results

将新类导入到mmpretrain/datasets/transforms/init.py.

...
from .my_transform import MyTransform

__all__ = [
    ..., 'MyTransform'
]

在配置文件中使用它。

train_pipeline = [
    ...
    dict(type='MyTransform'),
    ...
]

5、管道可视化

设计数据管道后,您可以使用可视化工具查看性能。
介绍数据集可视化工具

python tools/visualization/browse_dataset.py \
    ${CONFIG_FILE} \
    [-o, --output-dir ${OUTPUT_DIR}] \
    [-p, --phase ${DATASET_PHASE}] \
    [-n, --show-number ${NUMBER_IMAGES_DISPLAY}] \
    [-i, --show-interval ${SHOW_INTERRVAL}] \
    [-m, --mode ${DISPLAY_MODE}] \
    [-r, --rescale-factor ${RESCALE_FACTOR}] \
    [-c, --channel-order ${CHANNEL_ORDER}] \
    [--cfg-options ${CFG_OPTIONS}]

Description of all arguments:
config : The path of a model config file.
-o, --output-dir: The output path for visualized images. If not specified, it will be set to ‘’, which means not to save.
-p, --phase: Phase of visualizing dataset,must be one of [‘train’, ‘val’, ‘test’]. If not specified, it will be set to ‘train’.
-n, --show-number: The number of samples to visualized. If not specified, display all images in the dataset.
–show-interval: The interval of show (s).
-m, --mode: The display mode, can be one of [‘original’, ‘transformed’, ‘concat’, ‘pipeline’]. If not specified, it will be set to ‘transformed’.
-r, --rescale-factor: The image rescale factor, which is useful if the output is too large or too small in the original mode.
-c, --channel-order: The channel of the showing images, could be “BGR” or “RGB”, If not specified, it will be set to ‘BGR’.
–cfg-options : Modifications to the configuration file, refer to Learn about Configs.

注意
The -m, --mode is about display mode, display original pictures or transformed pictures or comparison pictures:
“original” means show images load from disk;
“transformed” means to show images after transformed;
“concat” means show images stitched by “original” and “transformed” images;
“pipeline” means show all the intermediate images throghout the pipeline.
The -r, --rescale-factor option is set when the label information is too large or too small relative to the picture. For example, when visualizing the CIFAR dataset, since the resolution of the image is very small, --rescale-factor can be set to 10.

How to visualize the original image

In ‘original’ mode:

python ./tools/visualization/browse_dataset.py ./configs/resnet/resnet101_8xb16_cifar10.py --phase val --output-dir tmp --mode original --show-number 100 --rescale-factor 10 --channel-order RGB

–phase val:视觉验证集,可以简化为;-p val

–output-dir tmp:可视化结果保存在“tmp”文件夹中,可以简化为;-o tmp

–mode original:将原始图像可视化,可以简化为;-m original

–show-number 100:可视化 100 张图像,可以简化为;-n 100

–rescale-factor:图像放大10倍,可简化为;-r 10

–channel-order RGB:可视化图像的通道顺序为“RGB”,可以简化为。-c RGB
insert image description here

如何可视化转换后的图像

在“转换”模式下:

python ./tools/visualization/browse_dataset.py ./configs/resnet/resnet50_8xb32_in1k.py -n 100

insert image description here

如何将转换后的图像和原始图像一起可视化

在“连接”模式下:

python ./tools/visualization/browse_dataset.py configs/swin_transformer/swin-small_16xb64_in1k.py -n 10 -m concat

insert image description here
在“管道”模式下:

python ./tools/visualization/browse_dataset.py configs/swin_transformer/swin-small_16xb64_in1k.py -m pipeline

insert image description here

python ./tools/visualization/browse_dataset.py configs/beit/beit_beit-base-p16_8xb256-amp-coslr-300e_in1k.py -m pipeline

insert image description here

3、自定义模型

在我们的设计中,一个完整的模型被定义为一个顶层模块,其中包含基于其功能的几个模型组件。
model:顶层模块定义任务的类型,例如Imageclassifier用于图像分类,MAE用于自监督学习,ImageToImageRetriever用于图像检索。
backbone:通常是一个特征提取网络,记录模型之间的主要差异,如ResNet, MobileNet。
neck:backbone和head之间的部分,如GlobalAveragePooling。
head:特定任务的组件,例如clhead, ContrastiveHead。
loss:磁头中用于计算损耗的组件,如CrossEntropyLoss、LabelSmoothLoss。target_generator:专门用于自监督学习任务的组件,如vokp、HoGGenerator。

Add a new model

一般来说,对于图像分类和检索任务,管道是一致的。然而,管道不同于每个自监督学习算法,如MAE和BEiT。因此,在本节中,我们将解释如何添加自监督学习算法。

增加一个新的自监督学习算法

创建一个新文件mmpretrain/models/selfsup/new_algorithm.py,并在其中实现newalgalgorithm。

from mmpretrain.registry import MODELS
from .base import BaseSelfSupvisor


@MODELS.register_module()
class NewAlgorithm(BaseSelfSupvisor):

    def __init__(self, backbone, neck=None, head=None, init_cfg=None):
        super().__init__(init_cfg)
        pass

    # ``extract_feat`` function is defined in BaseSelfSupvisor, you could
    # overwrite it if needed
    def extract_feat(self, inputs, **kwargs):
        pass

    # the core function to compute the loss
    def loss(self, inputs, data_samples, **kwargs):
        pass

导入mmpretrain/models/selfsup/init_.py文件中的新算法模块

...
from .new_algorithm import NewAlgorithm

__all__ = [
    ...,
    'NewAlgorithm',
    ...
]

Use it in your config file.

model = dict(
    type='NewAlgorithm',
    backbone=...,
    neck=...,
    head=...,
    ...
)

Add a new backbone

在这里,我们通过一个ResNet_CIFAR的例子来展示如何开发一个新的骨干组件。由于CIFAR的输入大小为32x32,远小于ImageNet的默认大小224x224,因此该主干将kernel_size=7, stride=2替换为kernel_size=3, stride=1,并在主干层后删除MaxPooling,避免将小的特征映射转发到剩余块。

import torch.nn as nn

from mmpretrain.registry import MODELS
from .resnet import ResNet


@MODELS.register_module()
class ResNet_CIFAR(ResNet):

    """ResNet backbone for CIFAR.

    short description of the backbone

    Args:
        depth(int): Network depth, from {
    
    18, 34, 50, 101, 152}.
        ...
    """

    def __init__(self, depth, deep_stem, **kwargs):
        # call ResNet init
        super(ResNet_CIFAR, self).__init__(depth, deep_stem=deep_stem, **kwargs)
        # other specific initializations
        assert not self.deep_stem, 'ResNet_CIFAR do not support deep_stem'

    def _make_stem_layer(self, in_channels, base_channels):
        # override the ResNet method to modify the network structure
        self.conv1 = build_conv_layer(
            self.conv_cfg,
            in_channels,
            base_channels,
            kernel_size=3,
            stride=1,
            padding=1,
            bias=False)
        self.norm1_name, norm1 = build_norm_layer(
            self.norm_cfg, base_channels, postfix=1)
        self.add_module(self.norm1_name, norm1)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        # Customize the forward method if needed.
        x = self.conv1(x)
        x = self.norm1(x)
        x = self.relu(x)
        outs = []
        for i, layer_name in enumerate(self.res_layers):
            res_layer = getattr(self, layer_name)
            x = res_layer(x)
            if i in self.out_indices:
                outs.append(x)
        # The return value needs to be a tuple with multi-scale outputs from different depths.
        # If you don't need multi-scale features, just wrap the output as a one-item tuple.
        return tuple(outs)

    def init_weights(self):
        # Customize the weight initialization method if needed.
        super().init_weights()

        # Disable the weight initialization if loading a pretrained model.
        if self.init_cfg is not None and self.init_cfg['type'] == 'Pretrained':
            return

        # Usually, we recommend using `init_cfg` to specify weight initialization methods
        # of convolution, linear, or normalization layers. If you have some special needs,
        # do these extra weight initialization here.
        ...

在mmpretrain/models/backbone /init.py中导入新的骨干模块。

...
from .resnet_cifar import ResNet_CIFAR

__all__ = [
    ..., 'ResNet_CIFAR'
]

修改配置文件中的相关设置。

model = dict(
    ...
    backbone=dict(
        type='ResNet_CIFAR',
        depth=18,
        ...),
    ...

Add a new neck

这里我们以GlobalAveragePooling为例。这是一个非常简单的脖子,没有任何争论。为了增加一个新的颈部,我们主要实现了转发函数,它对脊柱的输出进行一些操作,并将结果转发到头部。
在mmpretrain/models/neck /gap.py中创建一个新文件。

import torch.nn as nn

from mmpretrain.registry import MODELS

@MODELS.register_module()
class GlobalAveragePooling(nn.Module):

    def __init__(self):
        self.gap = nn.AdaptiveAvgPool2d((1, 1))

    def forward(self, inputs):
        # we regard inputs as tensor for simplicity
        outs = self.gap(inputs)
        outs = outs.view(inputs.size(0), -1)
        return outs

在mmpretrain/models/颈部/init.py中导入新的颈部模块。

...
from .gap import GlobalAveragePooling

__all__ = [
    ..., 'GlobalAveragePooling'
]

修改配置文件中的相关设置。

model = dict(
    neck=dict(type='GlobalAveragePooling'),
)

Add a new head

Based on ClsHead

在这里,我们展示了如何通过简化的visionTransformerClsHead的例子来开发一个新的头部,如下所示。为了实现一个新的分类头,我们需要在最终分类头之前为进程实现一个pre_logits方法和一个转发方法。
为什么我们需要pre_logits方法?
在分类任务中,我们通常使用线性laver进行最终分类,有时我们需要在最终分类之前获得特征,这就是pre_logits方法的输出。
在mmpretrain/models/heads/vit_head.py中创建一个新文件。

import torch.nn as nn

from mmpretrain.registry import MODELS
from .cls_head import ClsHead


@MODELS.register_module()
class VisionTransformerClsHead(ClsHead):

    def __init__(self, num_classes, in_channels, hidden_dim, **kwargs):
        super().__init__(**kwargs)
        self.in_channels = in_channels
        self.num_classes = num_classes
        self.hidden_dim = hidden_dim

        self.fc1 = nn.Linear(in_channels, hidden_dim)
        self.act = nn.Tanh()
        self.fc2 = nn.Linear(hidden_dim, num_classes)

    def pre_logits(self, feats):
        # The output of the backbone is usually a tuple from multiple depths,
        # and for classification, we only need the final output.
        feat = feats[-1]

        # The final output of VisionTransformer is a tuple of patch tokens and
        # classification tokens. We need classification tokens here.
        _, cls_token = feat

        # Do all works except the final classification linear layer.
        return self.act(self.fc1(cls_token))

    def forward(self, feats):
        pre_logits = self.pre_logits(feats)

        # The final classification linear layer.
        cls_score = self.fc2(pre_logits)
        return cls_score

导入mmpretrain/models/heads/ init .py文件中的模块。

...
from .vit_head import VisionTransformerClsHead

__all__ = [
    ..., 'VisionTransformerClsHead'
]

修改配置文件中的相关设置。

model = dict(
    head=dict(
        type='VisionTransformerClsHead',
        ...,
    ))

Based on BaseModule
MAEPretrainHead这是一个基于BaseModule掩模图像建模任务并实现的示例。需要实现loss生成损失的函数,但其​​他辅助函数是可选的。

# Copyright (c) OpenMMLab. All rights reserved.
import torch
from mmengine.model import BaseModule

from mmpretrain.registry import MODELS


@MODELS.register_module()
class MAEPretrainHead(BaseModule):
    """Head for MAE Pre-training."""

    def __init__(self,
                 loss: dict,
                 norm_pix: bool = False,
                 patch_size: int = 16) -> None:
        super().__init__()
        self.norm_pix = norm_pix
        self.patch_size = patch_size
        self.loss_module = MODELS.build(loss)

    def patchify(self, imgs: torch.Tensor) -> torch.Tensor:
        """Split images into non-overlapped patches."""
        p = self.patch_size
        assert imgs.shape[2] == imgs.shape[3] and imgs.shape[2] % p == 0

        h = w = imgs.shape[2] // p
        x = imgs.reshape(shape=(imgs.shape[0], 3, h, p, w, p))
        x = torch.einsum('nchpwq->nhwpqc', x)
        x = x.reshape(shape=(imgs.shape[0], h * w, p**2 * 3))
        return x

    def construct_target(self, target: torch.Tensor) -> torch.Tensor:
        """Construct the reconstruction target."""
        target = self.patchify(target)
        if self.norm_pix:
            # normalize the target image
            mean = target.mean(dim=-1, keepdim=True)
            var = target.var(dim=-1, keepdim=True)
            target = (target - mean) / (var + 1.e-6)**.5

        return target

    def loss(self, pred: torch.Tensor, target: torch.Tensor,
             mask: torch.Tensor) -> torch.Tensor:
        """Generate loss."""
        target = self.construct_target(target)
        loss = self.loss_module(pred, target, mask)

        return loss

实现后,以下步骤与基于ClsHead中的步骤2和步骤3相同。

Add a new loss

为了增加新的损失函数,我们主要在损失模块中实现前向函数。我们也应该将loss模块注册为model。此外,利用修饰器weighted_loss对每个元素的损失进行加权也是有帮助的。假设我们想要模拟由另一个分类模型生成的概率分布,我们实现一个L1Loss来实现如下目的。
在mmpretrain/models/losses/ 11_losses .py中创建一个新文件。

import torch
import torch.nn as nn

from mmpretrain.registry import MODELS
from .utils import weighted_loss

@weighted_loss
def l1_loss(pred, target):
    assert pred.size() == target.size() and target.numel() > 0
    loss = torch.abs(pred - target)
    return loss

@MODELS.register_module()
class L1Loss(nn.Module):

    def __init__(self, reduction='mean', loss_weight=1.0):
        super(L1Loss, self).__init__()
        self.reduction = reduction
        self.loss_weight = loss_weight

    def forward(self,
                pred,
                target,
                weight=None,
                avg_factor=None,
                reduction_override=None):
        assert reduction_override in (None, 'none', 'mean', 'sum')
        reduction = (
            reduction_override if reduction_override else self.reduction)
        loss = self.loss_weight * l1_loss(
            pred, target, weight, reduction=reduction, avg_factor=avg_factor)
        return loss

导入mmpretrain/models/losses/init.py文件中的模块。

...
from .l1_loss import L1Loss

__all__ = [
    ..., 'L1Loss'
]

修改头部配置中的loss字段。

model = dict(
    head=dict(
        loss=dict(type='L1Loss', loss_weight=1.0),
    ))

最后,我们可以将所有新的模型组件组合到一个配置文件中,为最佳实践创建一个新的模型。因为ResNet_CIFAR不是一个基于vit的主干,所以我们在这里没有实现VisionTransformerClsHead。

model = dict(
    type='ImageClassifier',
    backbone=dict(
        type='ResNet_CIFAR',
        depth=18,
        num_stages=4,
        out_indices=(3, ),
        style='pytorch'),
    neck=dict(type='GlobalAveragePooling'),
    head=dict(
        type='LinearClsHead',
        num_classes=10,
        in_channels=512,
        loss=dict(type='L1Loss', loss_weight=1.0),
        topk=(1, 5),
    ))

4、定制培训计划

在我们的代码库中,已经为 CIFAR、ImageNet 等常见数据集提供了默认的训练计划。如果我们尝试在这些数据集上进行实验以获得更高的准确性或在不同的新方法和数据集上进行实验,我们可能需要修改策略。

在本教程中,我们将介绍如何修改配置来构建优化器,使用参数化精细配置、梯度裁剪、梯度累积以及自定义学习率和动量计划。此外,引入模板来为项目定制自行实现的优化方法。

Customize optimization

我们使用optim_wrapper字段来配置优化策略,包括优化器的选择、自动混合精确训练的选择、参数化配置、梯度裁剪和累积。详情如下。

1、使用PyTorch支持的优化器

我们支持PyTorch实现的所有优化器,要使用它们,请更改配置文件的optimizer字段。例如,如果您想使用esgd,那么配置文件中的修改可以如下所示。注意,与优化相关的设置应该全部包装在optim_wrapper中。

optim_wrapper = dict(
    type='OptimWrapper',
    optimizer=dict(type='SGD', lr=0.0003, weight_decay=0.0001)
)

在PyTorch中,optimizer中的type不是构造函数,而是优化器的名称。参考PyTorch支持的优化器列表以获得更多选择。

例如,如果您想使用PyTorch 中的Adam设置。您可以使用下面的配置:torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)

optim_wrapper = dict(
    type='OptimWrapper',
    optimizer = dict(
        type='Adam',
        lr=0.001,
        betas=(0.9, 0.999),
        eps=1e-08,
        weight_decay=0,
        amsgrad=False),
)

字段的默认类型optim_wrapper是OptimWrapper,因此,通常可以省略类型字段,例如:

optim_wrapper = dict(
optimizer=dict(
type=‘Adam’,
lr=0.001,
betas=(0.9, 0.999),
eps=1e-08,
weight_decay=0,
amsgrad=False))

2、使用 AMP 训练

如果我们想使用自动混合精确训练,我们可以简单地在配置文件中更改optim_wrapper类型to AmpOptimwrapper的。

optim_wrapper = dict(type='AmpOptimWrapper', optimizer=...)

或者,为了方便起见,我们可以在tools/train.pyscript中设置——amp参数来直接打开amp选项。有关开始培训的详细信息,请参阅培训教程。

3、参数精细配置

某些模型可能具有特定于参数的优化设置,例如,BatchNorm 层没有权重衰减或对不同网络层使用不同的学习率。为了精细地配置它们,我们可以使用paramwise_cfg中的参数optim_wrapper。
为不同类型的参数设置不同的超参数乘数。
例如,我们可以设置norm_decay_mul=0在paramwise_cfg中改变归一化层的权重衰减和偏置为零。

optim_wrapper = dict(
    optimizer=dict(type='SGD', lr=0.8, weight_decay=1e-4),
    paramwise_cfg=dict(norm_decay_mult=0.))

支持配置更多类型的参数,列表如下:
bias_lr_mult:偏差学习率的乘数(不包括归一化层的偏差和可变形卷积层的偏移)。默认为 1。
bias_decay_mult:偏差权重衰减的乘数(不包括归一化层的偏差和可变形卷积层的偏移)。默认为 1。
norm_decay_mult:权重衰减和归一化层偏差的乘数。默认为 1。
flat_decay_mult:所有一维参数权重衰减的乘数。默认为 1。
dwconv_decay_mult:深度卷积层权重衰减的乘数。默认为 1。
bypass_duplicate:是否绕过重复参数。默认为False.
dcn_offset_lr_mult:可变形卷积层学习率的乘数。默认为 1。
为特定参数设置不同的超参数乘数。

MMPretrain 可以用来custom_keys指定paramwise_cfg不同的参数以使用不同的学习率或权重衰减。

例如,要将所有学习率和权重衰减设置为backbone.layer00,其余部分backbone与优化器相同,并将学习率设置head为 0.001,请使用以下配置。

optim_wrapper = dict(
    optimizer=dict(type='SGD', lr=0.01, weight_decay=0.0001),
    paramwise_cfg=dict(
        custom_keys={
    
    
            'backbone.layer0': dict(lr_mult=0, decay_mult=0),
            'backbone': dict(lr_mult=1),
            'head': dict(lr_mult=0.1)
        }))

4、Gradient clipping

在训练过程中,损失函数可能会接近陡峭区域并导致梯度爆炸。并且梯度裁剪有助于稳定训练过程。更多介绍可以在这个页面找到。

目前我们支持梯度裁剪clip_grad选项optim_wrapper,请参阅PyTorch 文档。
这是一个例子:

optim_wrapper = dict(
    optimizer=dict(type='SGD', lr=0.01, weight_decay=0.0001),
    # norm_type: type of the used p-norm, here norm_type is 2.
    clip_grad=dict(max_norm=35, norm_type=2))

5、梯度累积

当计算资源缺乏时,batch size只能设置为较小的值,这可能会影响模型的性能。可以利用梯度累积来解决这个问题。我们支持accumulative_counts选项在optim_wrapper梯度累积

train_dataloader = dict(batch_size=64)
optim_wrapper = dict(
    optimizer=dict(type='SGD', lr=0.01, weight_decay=0.0001),
    accumulative_counts=4)

表示在训练过程中,每 4 迭代进行一次反向传播。上面等价于:

train_dataloader = dict(batch_size=256)
optim_wrapper = dict(
    optimizer=dict(type='SGD', lr=0.01, weight_decay=0.0001))

5、自定义评估指标

在 MMPretrain 中使用指标
在MMPretrain中,我们为单标签分类和多标签分类提供了多个指标:

单标签分类:
Accuracy
SingleLabelMetric,包括准确率、召回率、f1 分数和支持度。

多标签分类:
AveragePrecision,或 AP (mAP)。
MultiLabelMetric,包括准确率、召回率、f1 分数和支持度。

要在验证和测试期间使用这些指标,我们需要修改 配置文件中的val_evaluator和test_evaluator字段。

这里有几个例子:
在验证和测试期间计算 top-1 和 top-5 准确度。

val_evaluator = dict(type='Accuracy', topk=(1, 5))
test_evaluator = val_evaluator

在验证和测试期间计算 top-1 准确度、top-5 准确度、精确度和召回率。

val_evaluator = [
  dict(type='Accuracy', topk=(1, 5)),
  dict(type='SingleLabelMetric', items=['precision', 'recall']),
]
test_evaluator = val_evaluator

计算 mAP(平均平均精度)、CP(类平均精度)、CR(类平均召回率)、CF(类平均 F1 分数)、OP(总体平均精度)、OR(总体平均召回率)和OF1(总体平均 F1 分数)。

val_evaluator = [
  dict(type='AveragePrecision'),
  dict(type='MultiLabelMetric', average='macro'),  # class-wise mean
  dict(type='MultiLabelMetric', average='micro'),  # overall mean
]
test_evaluator = val_evaluator

添加新指标
MMPretrain支持为追求更高定制化的用户实现定制化的评价指标。

您需要在 下创建一个新文件mmpretrain/evaluation/metrics,并在该文件中实现新指标,例如在 中mmpretrain/evaluation/metrics/my_metric.py。MyMetric并创建一个继承的自定义评估指标类。BaseMetric in MMEngine

数据格式处理方法process和度量计算方法compute_metrics需要分别重写。将其添加到METRICS注册表以实施任何自定义的评估指标。

from mmengine.evaluator import BaseMetric
from mmpretrain.registry import METRICS

@METRICS.register_module()
class MyMetric(BaseMetric):

    def process(self, data_batch: Sequence[Dict], data_samples: Sequence[Dict]):
    """ The processed results should be stored in ``self.results``, which will
        be used to computed the metrics when all batches have been processed.
        `data_batch` stores the batch data from dataloader,
        and `data_samples` stores the batch outputs from model.
    """
        ...

    def compute_metrics(self, results: List):
    """ Compute the metrics from processed results and returns the evaluation results.
    """
        ...

然后,将其导入以mmpretrain/evaluation/metrics/init.py将其添加到mmpretrain.evaluation包中。

# In mmpretrain/evaluation/metrics/__init__.py
...
from .my_metric import MyMetric

__all__ = [..., 'MyMetric']

MyMetric最后,在配置文件的val_evaluator和字段中使用test_evaluator。

在这里插入代码片

6、MMPRETRAIN 中的约定

1、型号命名约定

我们遵循以下约定来命名模型。建议贡献者遵循相同的风格。模型名称分为五部分:算法信息、模块信息、预训练信息、训练信息和数据信息。逻辑上,不同部分用下划线连接’_‘,同一部分的单词用破折号连接’-'。

{
    
    algorithm info}_{
    
    module info}_{
    
    pretrain info}_{
    
    training info}_{
    
    data info}

algorithm info(可选):主要算法信息,包括MAE、BEiT等主要训练算法。
module info:模块信息,通常包括backbone名称,如resnet、vit等。
pretrain info:(可选):预训练模型信息,例如预训练模型是在ImageNet-21k上训练的。
training info:训练信息,一些训练计划,包括批量大小、lr 计划、数据增强等。
data info:数据信息,通常包括数据集名称、输入大小等,如imagenet、cifar等。

1、算法信息

训练模型的主要算法名称。例如:
simclr
mocov2
eva-mae-style

通过监督图像分类训练的模型可以省略该字段。

2、模块信息

模型的模块,通常,骨干必须包含在该字段中,颈部和头部信息可以省略。例如:

resnet50
vit-base-p16
swin-base

3、训练前信息

如果模型是预训练模型的微调模型,我们需要记录预训练模型的一些信息。例如:

预训练模型来源:fb、openai等
训练预训练模型的方法:clip、mae、distill等。
预训练使用的数据集:in21k、laion2b等(in1k可省略)
培训时长:300e、1600e等
并非所有信息都是必需的,仅选择必要的信息来区分不同的预训练模型。

在此字段的末尾,使用 a-pre作为标识符,例如mae-in21k-pre。

4、培训信息

训练计划,包括训练类型、数据增强、特殊损失函数等:batch sizelr schedule

格式,如{gpu x batch_per_gpu}8xb32

训练类型(主要见于Transformer网络,比如ViT算法,通常分为两种训练类型:预训练和微调):
ft:用于微调的配置文件
pt:预训练的配置文件

训练食谱。通常只会对与原论文不同的部分进行标记。这些方法将按顺序排列。{pipeline aug}-{train aug}-{loss trick}-{scheduler}-{epochs}

coslr-200e:使用cosine调度器训练200个epoch
autoaug-mixup-lbs-coslr-50e:使用autoaug, mixup, ,训练 50 个 epoch,label smooth cosine scheduler

如果模型是从第三方存储库(如官方存储库)转换而来的,则可以省略训练信息并使用 a3rdparty作为标识符。

5、数据信息

in1k:ImageNet1k数据集,默认使用输入图像尺寸224x224;
in21k:ImageNet21k数据集,也叫ImageNet22k数据集,默认使用输入图像大小224x224;
in1k-384px:表示输入图像尺寸为384x384;
cifar100

6、型号名称示例

vit-base-p32_clip-openai-pre_3rdparty_in1k

vit-base-p32:模块信息
clip-openai-pre:训练前信息。
clip:预训练方法是clip。
openai:预训练模型来自OpenAI。
pre:预训练标识符。
3rdparty:模型是从第三方存储库转换而来的。
in1k:数据集信息。该模型是根据 ImageNet-1k 数据集进行训练的,输入大小为224x224。

beit_beit-base-p16_8xb256-amp-coslr-300e_in1k

beit:算法信息
beit-base:模块信息,由于主干是从BEiT修改而来的ViT,所以主干名称也是beit。
8xb256-amp-coslr-300e:培训信息。
8xb256:使用 8 个 GPU,每个 GPU 上的批量大小为 256。
amp:使用自动混合精度训练。
coslr:使用余弦退火学习率调度程序。
300e:训练 300 个 epoch。
in1k:数据集信息。该模型是根据 ImageNet-1k 数据集进行训练的,输入大小为224x224。

2、配置文件命名约定

The naming of the configuration files is almost the same as the model name, but there are a few differences:
the training information is necessary, not necessary 3rdparty.
If the configuration file only contains backbone settings, there are neither header settings nor dataset settings. We named it . Such profiles are often used for third-party pretrained models on large datasets. {module info}_headless.py
checkpoint naming convention
Weight naming mainly includes model name, date and hash value.

{
    
    model_name}_{
    
    date}-{
    
    hash}.pth

Guess you like

Origin blog.csdn.net/qq_41627642/article/details/131602348