The whole process of MMDetection framework training and testing

foreword

  • MMDetectionIt is an object detection toolbox, which contains a wealth of object detection, instance segmentation, panorama segmentation algorithms and related components and modules, github project address .
  • Supported object detection ( Object Detection) models (some SOTA models in recent years): DAB-DETR, RTMDet, GLIP, Detic, DINO
  • Supported instance segmentation ( Instance Segmentation) models (some SOTA models in recent years): Mask2former, BoxInst, SparseInst, RTMDet
  • Supported panoptic segmentation ( Panoptic Segmentation) models: Panoptic FPN, MaskFormer, Mask2Former
  • The difference between instance segmentation and panoramic segmentation: panoramic segmentation provides both pixel-level semantic categories and instance identifiers, while instance segmentation only focuses on the boundaries and segmentation of object instances. Panoramic segmentation provides more comprehensive information and is suitable for tasks that require fine-grained analysis of each pixel, such as autonomous driving. Instance segmentation is more focused on detecting and segmenting object instances, and is suitable for tasks such as object detection and image segmentation.
  • This article mainly introduces MMDetectionthe training and testing process, Dog and Cat Detectionfine-tuned RTMDetthe model on the data set, analyzed RTMDetthe model, and the final model index bbox_mAPreached 0.952.

Environment configuration

  • The complete environment configuration code is as follows. If you don’t want to see the step-by-step analysis, you can skip the rest of this section:
import IPython.display as display

!pip install openmim
!mim install mmengine==0.7.2
# 构建wheel,需要30分钟,构建好以后将whl文件放入单独的文件夹
# !git clone https://github.com/open-mmlab/mmcv.git
# !cd mmcv && CUDA_HOME=/usr/local/cuda-11.8 MMCV_WITH_OPS=1 pip wheel --wheel-dir=/kaggle/working .
!pip install -q /kaggle/input/frozen-packages-mmdetection/mmcv-2.0.1-cp310-cp310-linux_x86_64.whl

!rm -rf mmdetection
!git clone https://github.com/open-mmlab/mmdetection.git
!git clone https://github.com/open-mmlab/mmyolo.git
%cd mmdetection

%pip install -e .

!pip install wandb
display.clear_output()
  • First install open-mmlabthe package management library openmim, and then install mmenginethe library, the code is as follows:
!pip install openmim
!mim install mmengine==0.7.2
  • Since kaggleit cannot be miminstalled directly mmcv(subsequent training will report an error), we can only wheelinstall it by building. The code is as follows:
!git clone https://github.com/open-mmlab/mmcv.git
!cd mmcv && CUDA_HOME=/usr/local/cuda-11.8 MMCV_WITH_OPS=1 pip wheel --wheel-dir=/kaggle/working .
  • The above step needs to wait for about 30 minutes, and then you will /kaggle/workingfind mmcv-2.0.1-cp310-cp310-linux_x86_64.whlthe file in the directory, pip install -q /kaggle/working/mmcv-2.0.1-cp310-cp310-linux_x86_64.whljust use the installation. But in order to save time and prevent the need to wait for a long time for each run, I will download wheeland upload the build kaggle Datasetsso that it can be installed only by loading the data set each time, and the data address is provided here . So the install code becomes:
!pip install -q /kaggle/input/frozen-packages-mmdetection/mmcv-2.0.1-cp310-cp310-linux_x86_64.whl
  • Installed through git clonethe method mmdetection, because the dataset is .xmla suffix, we need to use mmyolothe tool to convert the format later, so download it together, but do not install it mmyolo.
!rm -rf mmdetection
!git clone https://github.com/open-mmlab/mmdetection.git
!git clone https://github.com/open-mmlab/mmyolo.git

# 进入mmdetection项目文件夹
%cd mmdetection

# 安装mmdetection
%pip install -e .
!pip install wandb

import wandb
wandb.login()

model reasoning

  • We first create a folder checkpointsto store the pretrained weights of the model. Because we chose RTMDetthe model, download the corresponding weights.
  • We can open mmdetectionthe github project address, enter configs/rtmdetthe path, and README.mdthere are detailed pre-training weights in the file.
    insert image description here
  • ParamsIt can be seen that the more model parameters ( ), box APthe higher the accuracy index ( ), we choose a model with a moderate amount of parameters RTMDet-l, and the corresponding configsfile name is rtmdet_l_8xb32-300e_coco.py. Means RTMDet-l model, on 8 GPUs batch sizewith 32 each, trained with 300 weights cocoon the dataset . epochsDownload and save in checkpointsfolder
!mkdir ./checkpoints
!mim download mmdet --config rtmdet_l_8xb32-300e_coco --dest ./checkpoints
  • Use the model to reason and visualize the results of the reasoning
from mmdet.apis import DetInferencer

model_name = 'rtmdet_l_8xb32-300e_coco'
checkpoint = './checkpoints/rtmdet_l_8xb32-300e_coco_20220719_112030-5a0be7c4.pth'

device = 'cuda:0'

inferencer = DetInferencer(model_name, checkpoint, device)

img = './demo/demo.jpg'

result = inferencer(img, out_dir='./output')
display.clear_output()

from PIL import Image
Image.open('./output/vis/demo.jpg')

Please add a picture description

  • If there are no problems up to here, it means that the environment configuration is very successful, and the RTMDet model makes an inference.

data collation

  • Dataset Dog and Cat Detectionfile organization information:
 - Dog-and-Cat-Detection
     - annotations
         - Cats_Test0.xml
         - Cats_Test1.xml
         - Cats_Test2.xml
         - ...
     - images
         - Cats_Test0.png
         - Cats_Test1.png
         - Cats_Test2.png
         - ...
  • Since the data set under the path is read-only, it is not allowed to be changed, and the marked file is in a kaggleformat that needs to be converted. Here, first copy the picture to the directoryinput.xml./data/images
import shutil

# 复制文件到工作目录
shutil.copytree('/kaggle/input/dog-and-cat-detection/images', './data/images')
  • Since the subsequent segmentation of the dataset requires labeling information as .jsona format, we convert the files dog-and-cat-detection/annotationsin the folder .xmlinto one .jsonfile.
import xml.etree.ElementTree as ET
import os
import json

coco = dict()
coco['images'] = []
coco['type'] = 'instances'
coco['annotations'] = []
coco['categories'] = []

category_set = dict()
image_set = set()

category_item_id = -1
image_id = 0
annotation_id = 0


def addCatItem(name):
    global category_item_id
    category_item = dict()
    category_item['supercategory'] = 'none'
    category_item_id += 1
    category_item['id'] = category_item_id
    category_item['name'] = name
    coco['categories'].append(category_item)
    category_set[name] = category_item_id
    return category_item_id


def addImgItem(file_name, size):
    global image_id
    if file_name is None:
        raise Exception('Could not find filename tag in xml file.')
    if size['width'] is None:
        raise Exception('Could not find width tag in xml file.')
    if size['height'] is None:
        raise Exception('Could not find height tag in xml file.')
    image_id += 1
    image_item = dict()
    image_item['id'] = image_id
    image_item['file_name'] = file_name + ".png"
    image_item['width'] = size['width']
    image_item['height'] = size['height']
    coco['images'].append(image_item)
    image_set.add(file_name)
    return image_id


def addAnnoItem(object_name, image_id, category_id, bbox):
    global annotation_id
    annotation_item = dict()
    annotation_item['segmentation'] = []
    seg = []
    seg.append(bbox[0])
    seg.append(bbox[1])
    seg.append(bbox[0])
    seg.append(bbox[1] + bbox[3])
    seg.append(bbox[0] + bbox[2])
    seg.append(bbox[1] + bbox[3])
    seg.append(bbox[0] + bbox[2])
    seg.append(bbox[1])

    annotation_item['segmentation'].append(seg)

    annotation_item['area'] = bbox[2] * bbox[3]
    annotation_item['iscrowd'] = 0
    annotation_item['ignore'] = 0
    annotation_item['image_id'] = image_id
    annotation_item['bbox'] = bbox
    annotation_item['category_id'] = category_id
    annotation_id += 1
    annotation_item['id'] = annotation_id
    coco['annotations'].append(annotation_item)


def parseXmlFiles(xml_path):
    for f in os.listdir(xml_path):
        if not f.endswith('.xml'):
            continue
        xmlname = f.split('.xml')[0]

        bndbox = dict()
        size = dict()
        current_image_id = None
        current_category_id = None
        file_name = None
        size['width'] = None
        size['height'] = None
        size['depth'] = None

        xml_file = os.path.join(xml_path, f)

        tree = ET.parse(xml_file)
        root = tree.getroot()
        if root.tag != 'annotation':
            raise Exception('pascal voc xml root element should be annotation, rather than {}'.format(root.tag))

        for elem in root:
            current_parent = elem.tag
            current_sub = None
            object_name = None

            if elem.tag == 'folder':
                continue

            if elem.tag == 'filename':
                file_name = xmlname
                if file_name in category_set:
                    raise Exception('file_name duplicated')

            elif current_image_id is None and file_name is not None and size['width'] is not None:
                if file_name not in image_set:
                    current_image_id = addImgItem(file_name, size)
                else:

                    raise Exception('duplicated image: {}'.format(file_name))

            for subelem in elem:
                bndbox['xmin'] = None
                bndbox['xmax'] = None
                bndbox['ymin'] = None
                bndbox['ymax'] = None

                current_sub = subelem.tag
                if current_parent == 'object' and subelem.tag == 'name':
                    object_name = subelem.text
                    if object_name not in category_set:
                        current_category_id = addCatItem(object_name)
                    else:
                        current_category_id = category_set[object_name]

                elif current_parent == 'size':
                    if size[subelem.tag] is not None:
                        raise Exception('xml structure broken at size tag.')
                    size[subelem.tag] = int(subelem.text)

                for option in subelem:
                    if current_sub == 'bndbox':
                        if bndbox[option.tag] is not None:
                            raise Exception('xml structure corrupted at bndbox tag.')
                        bndbox[option.tag] = int(float(option.text))

                if bndbox['xmin'] is not None:
                    if object_name is None:
                        raise Exception('xml structure broken at bndbox tag')
                    if current_image_id is None:
                        raise Exception('xml structure broken at bndbox tag')
                    if current_category_id is None:
                        raise Exception('xml structure broken at bndbox tag')
                    bbox = []
                    bbox.append(bndbox['xmin'])
                    bbox.append(bndbox['ymin'])
                    bbox.append(bndbox['xmax'] - bndbox['xmin'])
                    bbox.append(bndbox['ymax'] - bndbox['ymin'])
                    addAnnoItem(object_name, current_image_id, current_category_id, bbox)

os.makedirs('./data/annotations')
xml_path = '/kaggle/input/dog-and-cat-detection/annotations'
json_file = './data/annotations/annotations_all.json'
parseXmlFiles(xml_path)
json.dump(coco, open(json_file, 'w'))
  • Current working directory data storage file organization information:
 - mmdetection
    - data
         - annotations
             - annotations_all.json
         - images
             - Cats_Test0.png
             - Cats_Test1.png
             - Cats_Test2.png
             - ....
     - ...
  • Since we need to use mmyoloa script in the project file to divide the data into training and test sets, first enter mmyolothe project folder
# 切换到mmyolo项目文件夹
%cd /kaggle/working/mmyolo
  • The segmentation script file is located tools/misc/coco_split.py, and the parameters from top to bottom are: --json (generated .jsonfile path); --out-dir (generated segmentation .jsonfile storage folder path); --ratios 0.8 0.2 (training set, test set Proportion); –shuffle (whether to shuffle the order); –seed (random number seed)
# 切分训练、测试集
!python tools/misc/coco_split.py --json /kaggle/working/mmdetection/data/annotations/annotations_all.json \
                                --out-dir /kaggle/working/mmdetection/data/annotations \
                                --ratios 0.8 0.2 \
                                --shuffle \
                                --seed 2023
  • output:
Split info: ====== 
Train ratio = 0.8, number = 2949
Val ratio = 0, number = 0
Test ratio = 0.2, number = 737
Set the global seed: 2023
shuffle dataset.
Saving json to /kaggle/working/mmdetection/data/annotations/trainval.json
Saving json to /kaggle/working/mmdetection/data/annotations/test.json
All done!
  • Then switch back to mmdetectionthe project folder:
%cd /kaggle/working/mmdetection
  • At this time, the working directory data storage file organization information:
 - mmdetection
    - data
         - annotations
             - test.json
             - trainval.json
             - annotations_all.json
         - images
             - Cats_Test0.png
             - Cats_Test1.png
             - Cats_Test2.png
             - ....
     - ...

Edit RTMDet model configuration

  • RTMDetThe model architecture diagram can be found in the documentation of the corresponding parameter folder README.md.
    Please add a picture description

  • You can githubopen configs/rtmdet/rtmdet_l_8xb32-300e_coco.pythe configuration file in (observe the _base_ value, if there is an inheritance relationship, you can search up until you find the main file), where the RTMDet-lmodel model is already the main file and can be viewed directly.

  • The main things we want to change are _base_(inherited superior files), data_root(data storage folder), train_batch_size_per_gpu(per GPUtraining batch size), train_num_workers(number of core jobs, generally n GPU x 4), max_epochs(maximum epochnumber), base_lr(basic learning rate), metainfo(type information and the palette corresponding to each category), train_dataloader(picture path and training set labeling information), val_dataloader(picture path and validation set labeling information), val_evaluator(validation set labeling information), model(number of frozen backbone networks stages, number of categories), param_scheduler(learning rate decay trend), optim_wrapper(learning rate assignment), default_hooks(model weight preservation strategy), ( custom_hooksdata pipeline switching), ( load_frompre-training weight loading path), train_cfg(assignment max_epochsand verification measurement), randomness(fixed random number seed), visualizer(select visualization platform )

  • The most important thing in the configuration file is metainfothe parameters and modelparameters. Be sure to check whether the number of categories is correct and the number of palettes is consistent. Note: Even if there is only 1 category, metainfoit should be written as'classes': ('cat', ),There must be commas in parentheses, otherwise an error will be reported. modelin bbox_headshould also be consistent with the number of species.

  • Learning rate scaling generally follows a rule of thumb: base_lr_default * (your_bs / default_bs). From the structure diagram above, we can see RTMDetthat there are 4 models stages, and modelthe configuration dict(backbone=dict(frozen_stages=4), bbox_head=dict(num_classes=2))indicates that 4 models are frozen stages, that is, the backbone network is completely frozen.

config_animals = """
# Inherit and overwrite part of the config based on this config
_base_ = './rtmdet_l_8xb32-300e_coco.py'

data_root = './data/' # dataset root

train_batch_size_per_gpu = 24
train_num_workers = 4

max_epochs = 50
stage2_num_epochs = 6
base_lr = 0.000375


metainfo = {
    'classes': ('cat', 'dog', ),
    'palette': [
        (252, 215, 99), (153, 197, 252), 
    ]
}

train_dataloader = dict(
    batch_size=train_batch_size_per_gpu,
    num_workers=train_num_workers,
    dataset=dict(
        data_root=data_root,
        metainfo=metainfo,
        data_prefix=dict(img='images/'),
        ann_file='annotations/trainval.json'))

val_dataloader = dict(
    batch_size=train_batch_size_per_gpu,
    num_workers=train_num_workers,
    dataset=dict(
        data_root=data_root,
        metainfo=metainfo,
        data_prefix=dict(img='images/'),
        ann_file='annotations/trainval.json'))

test_dataloader = val_dataloader

val_evaluator = dict(ann_file=data_root + 'annotations/trainval.json')

test_evaluator = val_evaluator

model = dict(backbone=dict(frozen_stages=4), bbox_head=dict(num_classes=2))

# learning rate
param_scheduler = [
    dict(
        type='LinearLR',
        start_factor=1.0e-5,
        by_epoch=False,
        begin=0,
        end=1000),
    dict(
        # use cosine lr from 10 to 20 epoch
        type='CosineAnnealingLR',
        eta_min=base_lr * 0.05,
        begin=max_epochs // 2,
        end=max_epochs,
        T_max=max_epochs // 2,
        by_epoch=True,
        convert_to_iter_based=True),
]

train_pipeline_stage2 = [
    dict(type='LoadImageFromFile', backend_args=None),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(
        type='RandomResize',
        scale=(640, 640),
        ratio_range=(0.1, 2.0),
        keep_ratio=True),
    dict(type='RandomCrop', crop_size=(640, 640)),
    dict(type='YOLOXHSVRandomAug'),
    dict(type='RandomFlip', prob=0.5),
    dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))),
    dict(type='PackDetInputs')
]

# optimizer
optim_wrapper = dict(
    _delete_=True,
    type='OptimWrapper',
    optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05),
    paramwise_cfg=dict(
        norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True))

default_hooks = dict(
    checkpoint=dict(
        interval=5,
        max_keep_ckpts=2,  # only keep latest 2 checkpoints
        save_best='auto'
    ),
    logger=dict(type='LoggerHook', interval=20))

custom_hooks = [
    dict(
        type='PipelineSwitchHook',
        switch_epoch=max_epochs - stage2_num_epochs,
        switch_pipeline=train_pipeline_stage2)
]

# load COCO pre-trained weight
load_from = './checkpoints/rtmdet_l_8xb32-300e_coco_20220719_112030-5a0be7c4.pth'

train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=max_epochs, val_begin=20, val_interval=1)
randomness = dict(seed=2023, deterministic=True, diff_rank_seed=False)
visualizer = dict(vis_backends=[dict(type='LocalVisBackend'), dict(type='WandbVisBackend')])
"""

with open('./configs/rtmdet/rtmdet_l_1xb4-100e_animals.py', 'w') as f:
    f.write(config_animals)

model training

  • After doing the above work, you can start model training
!python tools/train.py configs/rtmdet/rtmdet_l_1xb4-100e_animals.py
  • model epoch = 50-time precision
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.952
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.995
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.800
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.919
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.959
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.964
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.965
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.965
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.800
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.939
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.970
07/10 07:35:26 - mmengine - INFO - bbox_mAP_copypaste: 0.952 1.000 0.995 0.800 0.919 0.959
07/10 07:35:27 - mmengine - INFO - Epoch(val) [50][123/123]    coco/bbox_mAP: 0.9520  coco/bbox_mAP_50: 1.0000  coco/bbox_mAP_75: 0.9950  coco/bbox_mAP_s: 0.8000  coco/bbox_mAP_m: 0.9190  coco/bbox_mAP_l: 0.9590  data_time: 0.0532  time: 0.8068
  • We can open wandbthe platform, track training accuracy, and visualize various indicators
    insert image description here
    insert image description here

model reasoning

  • After we fine-tune the model, we can reason on the picture
from mmdet.apis import DetInferencer
import glob

config = 'configs/rtmdet/rtmdet_l_1xb4-100e_animals.py'
checkpoint = glob.glob('./work_dirs/rtmdet_l_1xb4-100e_animals/best_coco*.pth')[0]

device = 'cuda:0'

inferencer = DetInferencer(config, checkpoint, device)

img = './data/images/Cats_Test1011.png'
result = inferencer(img, out_dir='./output', pred_score_thr=0.6)

display.clear_output()
Image.open('./output/vis/Cats_Test1011.png')

Please add a picture description

img = './data/images/Cats_Test1035.png'
result = inferencer(img, out_dir='./output', pred_score_thr=0.6)

display.clear_output()
Image.open('./output/vis/Cats_Test1035.png')

Please add a picture description

Guess you like

Origin blog.csdn.net/qq_20144897/article/details/131717202