How to use your own data set to quickly implement any semantic segmentation algorithm and algorithm improvement in the mmsegmentation algorithm library (1)

Before reading this article, fellow alchemists, Pindao first briefly introduces the content of the article: This series is suitable for undergraduates and research fellows who use the mmseg algorithm library magic weapon to do semantic segmentation graduation projects and write research papers. In this series, alchemy On behalf of the teacher, I will introduce a new and harmless method of using the mmsegmentation algorithm library magic weapon, as well as some common improvement methods (including but not limited to backbone improvement: replace the backbone/use hole convolution/insert plug-and-play various attention modules ; Segmentation head improvement: replace the segmentation head/design auxiliary segmentation head; loss function improvement: multiple loss function design/loss function optimize weight by segmentation category/optimize weight by loss function category). It is true that Pindao believes that these innovations are not enough for all fellow daoists to survive the calamity of small papers . Please treat them rationally, and they are only for reference and study!


mmsegmentation magic weapon introduction

Here is the introduction of OpenMMLab, a major monk sect:

MMSegmentation is an algorithm toolbox in the field of semantic segmentation in the OpenMMLab open source project. It implements many high-quality semantic segmentation algorithm models and data sets, and also provides a unified framework and benchmark for semantic segmentation tasks.

The main features of this instrument are as follows:

1. Unified benchmark platform:
Integrate various semantic segmentation algorithms into a unified toolbox for benchmarking.
2. Modular design:
MMSegmentation decouples the segmentation framework into different modular components. By combining different modular components, users can easily build a custom segmentation model.
3. Rich plug-and-play algorithms and models:
MMSegmentation supports many mainstream and latest detection algorithms, such as PSPNet, DeepLabV3, PSANet, DeepLabV3+, etc. 4.
Fast speed:
the training speed is faster than other semantic segmentation code bases or quite.

In order to make it easier for you to learn more about this magic weapon, Pindao attaches a few teleportation arrays:
If you want to install this magic weapon, please go to this channel: mmseg installation method
If you want to further study this magic weapon, please go to this channel Channel: mmseg document introduction
(Pindao thinks that all of you have practiced more than ten levels of qi, and understand the installation and configuration of cuda/cudnn/pytorch. If you have a fellow Taoist under this level, please move to this teleportation array to view the installation configuration method )


The method of using the magic weapon without injury

Since the beginning of the magic weapon came out, many colleagues have given the method of sacrifice. In my opinion, all of them are harmful to the magic weapon itself and the user. Most of the methods of sacrifice need to modify the underlying code of mmseg and the original config file. In this way, the data set cannot be used universally, and needs to be changed once it is run. In addition, most of the refining methods cannot provide users with an intuitive and modifiable cognition of the network they use, so that they only know the helpless status quo of running but not knowing how to change it. Pindao is struggling to find a workaround. Fellow Daoists were inspired when explaining the mmdetection algorithm library tool, experimented and succeeded, and then shared it with fellow practitioners. (Again: for reference only, nothing to look for)

1. The magical method of print_config

Daoists who have a certain understanding of the mm series algorithm library know that they need to call the config file to complete the initialization of many important parameters such as model construction, model learning strategy, and storage when running. However, if you can master the configuration method of the config file in In your own hands, you can hold the important parameters of the model tightly in your own hands. In this way, you can run and change! However, when we open the config file in the mmseg algorithm library, it is a different scene. Here we take the config file of FCN as an example (resnet50d is the backbone):
insert image description hereObviously, the config file of FCN calls the initial config file of the backbone, data augmentation, operation strategy, and learning strategy. Therefore, fellow Taoists are modifying the model and training You need to modify these config files one by one when you set your own data set. This method is also the basis of many fellow sacrifices. Pindao thinks this is the worst policy. First of all, it is very troublesome to change the files one by one. If you want to run the public data set after the modification, do you have to change it back? Therefore, the current mmseg algorithm library config file lacks an independent and integrated Instructions.
At this point, I will introduce the print_config function to fellow Taoists. Although I don’t know what the function is before, its function is very suitable for our running algorithm and this model. It can export the config file of any algorithm under the config folder. And the export entry is very clear, the running command is as follows:

python tools/ print_config.py configs/fcn/fcn_r50-d8_512x512_20k_voc12aug.py (original config file address)
my_config/fcn.py (target config address)

The effect is as follows:

norm_cfg = dict(type='SyncBN', requires_grad=True)
model = dict(
    type='EncoderDecoder',
    pretrained='open-mmlab://resnet50_v1c',
    backbone=dict(
        type='ResNetV1c',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        dilations=(1, 1, 2, 4),
        strides=(1, 2, 1, 1),
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        norm_eval=False,
        style='pytorch',
        contract_dilation=True),
    decode_head=dict(
        type='FCNHead',
        in_channels=2048,
        in_index=3,
        channels=512,
        num_convs=2,
        concat_input=True,
        dropout_ratio=0.1,
        num_classes=2,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=1024,
        in_index=2,
        channels=256,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=2,
        norm_cfg=dict(type='SyncBN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.4)),
    train_cfg=dict(),
    test_cfg=dict(mode='whole'))
dataset_type = 'PascalVOCDataset'
classes = ('background', 'leaky oil')
palette = [[128, 0, 0]]
data_root = 'data/VOCdevkit/VOC2012'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (512, 512)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
    dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
    dict(type='RandomCrop', crop_size=(512, 512), cat_max_ratio=0.75),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(2048, 512),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=4,
    workers_per_gpu=4,
    train=dict(
        type='PascalVOCDataset',
        data_root='data/VOCdevkit/VOC2012',
        img_dir='JPEGImages',
        classes=('background', 'leaky oil'),
        palette=[[128, 0, 0]],
        ann_dir=['SegmentationClass'],
        split=[
            'ImageSets/Segmentation/train.txt'
        ],
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations'),
            dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
            dict(type='RandomCrop', crop_size=(512, 512), cat_max_ratio=0.75),
            dict(type='RandomFlip', prob=0.5),
            dict(type='PhotoMetricDistortion'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_semantic_seg'])
        ]),
    val=dict(
        type='PascalVOCDataset',
        data_root='data/VOCdevkit/VOC2012',
        img_dir='JPEGImages',
        classes=('background', 'leaky oil'),
        palette=[[128, 0, 0]],
        ann_dir='SegmentationClass',
        split='ImageSets/Segmentation/val.txt',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2048, 512),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='PascalVOCDataset',
        data_root='data/VOCdevkit/VOC2012',
        img_dir='JPEGImages',
        classes=('background', 'leaky oil'),
        palette=[[128, 0, 0]],
        ann_dir='SegmentationClass',
        split='ImageSets/Segmentation/val.txt',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(2048, 512),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
log_config = dict(
    interval=100, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = 'checkpoints/fcn_r50-d8_512x512_20k_voc12.pth'
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True
optimizer = dict(type='SGD', lr=0.0025, momentum=0.9, weight_decay=0.0005)
optimizer_config = dict()
lr_config = dict(policy='poly', power=0.9, min_lr=0.0001, by_epoch=False)
runner = dict(type='IterBasedRunner', max_iters=10000)
checkpoint_config = dict(by_epoch=False, interval=1000)
evaluation = dict(
    interval=1000, metric=['mIoU', 'mDice', 'mFscore'], pre_eval=True)
work_dir = 'work_dir/fcn'
gpu_ids = range(0, 2)
auto_resume = False

You can clearly see that multiple important configs such as model, data loading, learning strategy, and running strategy are integrated in one file, and it is independent of the config file that comes with the mmseg algorithm library, and will not affect the underlying code .
So far, using your own data set to run the mmseg algorithm library and improve the task has achieved a big step!

2. Specific training methods

Here we take the VOC data set as an example. If you don’t know how to convert your own data set into a VOC data set, please move to Pindao’s blog: Quickly create your own VOC semantic segmentation data set

2.1 Dataset preparation

After preparing the VOC2012 data set, its file format is as follows:
insert image description here
create a data folder under the mmsegmentation algorithm library directory, create a VOCdevkits folder under the data folder, and copy VOC2012 in the VOCdevkits folder. At this point, the data set preparation is completed Work.

2.2 Configuration file export

Create a my_config folder under the mmsegmentation folder to store your own configuration files, find the configuration file address of the model you want to run, and use the following command to complete the configuration file export: (Take fcn as an example)

python tools/ print_config.py configs/fcn/fcn_r50-d8_512x512_20k_voc12aug.py
>my_config/fcn.py

2.3 Configuration file modification

1) Delete the first line of config:

2) Modify the num_class in the decode head and the num_class in the auxiliary head to be your own number of training categories

3) Add classes and palette (add under train/val/test)
insert image description hereinsert image description here

4) Modify the learning rate. There is a log log in the github of mmseg. After opening, use ctrl+F to search for nGPU, and compare the number of GPUs to scale the learning rate; click the model item on the page to download the pre-training weight (log address and download Address )
insert image description hereinsert image description here5) Pre-training weight loading
Create a new checkpoint folder under the mmsegmentation folder, place the downloaded pre-training weights in this folder, and modify the load_from parameter in the config file: 6
insert image description here) Add evaluation indicators:
insert image description here
Note : For new Category, pay attention to the supplementary category and corresponding color matching of the corresponding dataset management file voc.py under mmseg/dataset/, and pay attention to the one-to-one correspondence between categories and color matching. At this point, all
configuration file modification work is completed

2.4 Training and testing visualization commands

Training command:

bash  tools/dist_train.sh 1.py(对应配置文件地址) 2 –work-dir work_dir(模型保存地址) 

Test command:

bash tools/dist_test.sh my_config/fcn.py work_dir/fcn/latest.pth –show-dir work_dir/fcn/out --eval mIoU --out result.pkl

visualization

python tools/analyze_logs.py log.json --keys mIoU mAcc aAcc --legend mIoU mAcc aAcc
python tools/analyze_logs.py log.json --keys loss --legend loss

FPS indicator

python ./tools/benchmark.py my_config/fcn.py work_dir/fcn/latest.pth 

Summarize

The above is what I want to talk about today. This article only introduces the harmless use of the mmsegmentation algorithm library, and the model improvement method will be introduced in the next issue.
Past review:
(1) CBAM paper interpretation + Pytorch implementation of CBAM-ResNeXt
(2) ShuffleNet-V1 paper understanding and code reproduction
(3) ShuffleNet-V2 paper understanding and code reproduction
(4) GhostNet paper understanding and code reproduction Now
(5) PS is a real scientific research tool, helping rapid segmentation and labeling work
(6) Quickly create your own VOC semantic segmentation dataset
Next issue notice:
mmsegmentation algorithm library arbitrary semantic segmentation algorithm improvement method

Guess you like

Origin blog.csdn.net/qq_44840741/article/details/128681936