foreword
MMDetection
It is an object detection toolbox, which contains a wealth of object detection, instance segmentation, panorama segmentation algorithms and related components and modules, github project address .- Supported object detection (
Object Detection
) models (some SOTA models in recent years): DAB-DETR, RTMDet, GLIP, Detic, DINO - Supported instance segmentation (
Instance Segmentation
) models (some SOTA models in recent years): Mask2former, BoxInst, SparseInst, RTMDet - Supported panoptic segmentation (
Panoptic Segmentation
) models: Panoptic FPN, MaskFormer, Mask2Former - The difference between instance segmentation and panoramic segmentation: panoramic segmentation provides both pixel-level semantic categories and instance identifiers, while instance segmentation only focuses on the boundaries and segmentation of object instances. Panoramic segmentation provides more comprehensive information and is suitable for tasks that require fine-grained analysis of each pixel, such as autonomous driving. Instance segmentation is more focused on detecting and segmenting object instances, and is suitable for tasks such as object detection and image segmentation.
- This article mainly introduces
MMDetection
the training and testing process,Dog and Cat Detection
fine-tunedRTMDet
the model on the data set, analyzedRTMDet
the model, and the final model indexbbox_mAP
reached 0.952.
Environment configuration
- The complete environment configuration code is as follows. If you don’t want to see the step-by-step analysis, you can skip the rest of this section:
import IPython.display as display
!pip install openmim
!mim install mmengine==0.7.2
# 构建wheel,需要30分钟,构建好以后将whl文件放入单独的文件夹
# !git clone https://github.com/open-mmlab/mmcv.git
# !cd mmcv && CUDA_HOME=/usr/local/cuda-11.8 MMCV_WITH_OPS=1 pip wheel --wheel-dir=/kaggle/working .
!pip install -q /kaggle/input/frozen-packages-mmdetection/mmcv-2.0.1-cp310-cp310-linux_x86_64.whl
!rm -rf mmdetection
!git clone https://github.com/open-mmlab/mmdetection.git
!git clone https://github.com/open-mmlab/mmyolo.git
%cd mmdetection
%pip install -e .
!pip install wandb
display.clear_output()
- First install
open-mmlab
the package management libraryopenmim
, and then installmmengine
the library, the code is as follows:
!pip install openmim
!mim install mmengine==0.7.2
- Since
kaggle
it cannot bemim
installed directlymmcv
(subsequent training will report an error), we can onlywheel
install it by building. The code is as follows:
!git clone https://github.com/open-mmlab/mmcv.git
!cd mmcv && CUDA_HOME=/usr/local/cuda-11.8 MMCV_WITH_OPS=1 pip wheel --wheel-dir=/kaggle/working .
- The above step needs to wait for about 30 minutes, and then you will
/kaggle/working
findmmcv-2.0.1-cp310-cp310-linux_x86_64.whl
the file in the directory,pip install -q /kaggle/working/mmcv-2.0.1-cp310-cp310-linux_x86_64.whl
just use the installation. But in order to save time and prevent the need to wait for a long time for each run, I will downloadwheel
and upload the buildkaggle Datasets
so that it can be installed only by loading the data set each time, and the data address is provided here . So the install code becomes:
!pip install -q /kaggle/input/frozen-packages-mmdetection/mmcv-2.0.1-cp310-cp310-linux_x86_64.whl
- Installed through
git clone
the methodmmdetection
, because the dataset is.xml
a suffix, we need to usemmyolo
the tool to convert the format later, so download it together, but do not install itmmyolo
.
!rm -rf mmdetection
!git clone https://github.com/open-mmlab/mmdetection.git
!git clone https://github.com/open-mmlab/mmyolo.git
# 进入mmdetection项目文件夹
%cd mmdetection
# 安装mmdetection
%pip install -e .
- If there are pycocotools installation problems during the installation process, you can refer to my previous article MMYOLO framework labeling, training, and testing the whole process (supplement) , which contains detailed solutions.
- Because various indicators need to be visualized during the training process, install
wandb
the package and log in.
!pip install wandb
import wandb
wandb.login()
model reasoning
- We first create a folder
checkpoints
to store the pretrained weights of the model. Because we choseRTMDet
the model, download the corresponding weights. - We can open
mmdetection
the github project address, enterconfigs/rtmdet
the path, andREADME.md
there are detailed pre-training weights in the file.
Params
It can be seen that the more model parameters ( ),box AP
the higher the accuracy index ( ), we choose a model with a moderate amount of parametersRTMDet-l
, and the correspondingconfigs
file name isrtmdet_l_8xb32-300e_coco.py
. Means RTMDet-l model, on 8 GPUsbatch size
with 32 each, trained with 300 weightscoco
on the dataset .epochs
Download and save incheckpoints
folder
!mkdir ./checkpoints
!mim download mmdet --config rtmdet_l_8xb32-300e_coco --dest ./checkpoints
- Use the model to reason and visualize the results of the reasoning
from mmdet.apis import DetInferencer
model_name = 'rtmdet_l_8xb32-300e_coco'
checkpoint = './checkpoints/rtmdet_l_8xb32-300e_coco_20220719_112030-5a0be7c4.pth'
device = 'cuda:0'
inferencer = DetInferencer(model_name, checkpoint, device)
img = './demo/demo.jpg'
result = inferencer(img, out_dir='./output')
display.clear_output()
from PIL import Image
Image.open('./output/vis/demo.jpg')
- If there are no problems up to here, it means that the environment configuration is very successful, and the RTMDet model makes an inference.
data collation
- Dataset
Dog and Cat Detection
file organization information:
- Dog-and-Cat-Detection
- annotations
- Cats_Test0.xml
- Cats_Test1.xml
- Cats_Test2.xml
- ...
- images
- Cats_Test0.png
- Cats_Test1.png
- Cats_Test2.png
- ...
- Since the data set under the path is read-only, it is not allowed to be changed, and the marked file is in a
kaggle
format that needs to be converted. Here, first copy the picture to the directoryinput
.xml
./data/images
import shutil
# 复制文件到工作目录
shutil.copytree('/kaggle/input/dog-and-cat-detection/images', './data/images')
- Since the subsequent segmentation of the dataset requires labeling information as
.json
a format, we convert the filesdog-and-cat-detection/annotations
in the folder.xml
into one.json
file.
import xml.etree.ElementTree as ET
import os
import json
coco = dict()
coco['images'] = []
coco['type'] = 'instances'
coco['annotations'] = []
coco['categories'] = []
category_set = dict()
image_set = set()
category_item_id = -1
image_id = 0
annotation_id = 0
def addCatItem(name):
global category_item_id
category_item = dict()
category_item['supercategory'] = 'none'
category_item_id += 1
category_item['id'] = category_item_id
category_item['name'] = name
coco['categories'].append(category_item)
category_set[name] = category_item_id
return category_item_id
def addImgItem(file_name, size):
global image_id
if file_name is None:
raise Exception('Could not find filename tag in xml file.')
if size['width'] is None:
raise Exception('Could not find width tag in xml file.')
if size['height'] is None:
raise Exception('Could not find height tag in xml file.')
image_id += 1
image_item = dict()
image_item['id'] = image_id
image_item['file_name'] = file_name + ".png"
image_item['width'] = size['width']
image_item['height'] = size['height']
coco['images'].append(image_item)
image_set.add(file_name)
return image_id
def addAnnoItem(object_name, image_id, category_id, bbox):
global annotation_id
annotation_item = dict()
annotation_item['segmentation'] = []
seg = []
seg.append(bbox[0])
seg.append(bbox[1])
seg.append(bbox[0])
seg.append(bbox[1] + bbox[3])
seg.append(bbox[0] + bbox[2])
seg.append(bbox[1] + bbox[3])
seg.append(bbox[0] + bbox[2])
seg.append(bbox[1])
annotation_item['segmentation'].append(seg)
annotation_item['area'] = bbox[2] * bbox[3]
annotation_item['iscrowd'] = 0
annotation_item['ignore'] = 0
annotation_item['image_id'] = image_id
annotation_item['bbox'] = bbox
annotation_item['category_id'] = category_id
annotation_id += 1
annotation_item['id'] = annotation_id
coco['annotations'].append(annotation_item)
def parseXmlFiles(xml_path):
for f in os.listdir(xml_path):
if not f.endswith('.xml'):
continue
xmlname = f.split('.xml')[0]
bndbox = dict()
size = dict()
current_image_id = None
current_category_id = None
file_name = None
size['width'] = None
size['height'] = None
size['depth'] = None
xml_file = os.path.join(xml_path, f)
tree = ET.parse(xml_file)
root = tree.getroot()
if root.tag != 'annotation':
raise Exception('pascal voc xml root element should be annotation, rather than {}'.format(root.tag))
for elem in root:
current_parent = elem.tag
current_sub = None
object_name = None
if elem.tag == 'folder':
continue
if elem.tag == 'filename':
file_name = xmlname
if file_name in category_set:
raise Exception('file_name duplicated')
elif current_image_id is None and file_name is not None and size['width'] is not None:
if file_name not in image_set:
current_image_id = addImgItem(file_name, size)
else:
raise Exception('duplicated image: {}'.format(file_name))
for subelem in elem:
bndbox['xmin'] = None
bndbox['xmax'] = None
bndbox['ymin'] = None
bndbox['ymax'] = None
current_sub = subelem.tag
if current_parent == 'object' and subelem.tag == 'name':
object_name = subelem.text
if object_name not in category_set:
current_category_id = addCatItem(object_name)
else:
current_category_id = category_set[object_name]
elif current_parent == 'size':
if size[subelem.tag] is not None:
raise Exception('xml structure broken at size tag.')
size[subelem.tag] = int(subelem.text)
for option in subelem:
if current_sub == 'bndbox':
if bndbox[option.tag] is not None:
raise Exception('xml structure corrupted at bndbox tag.')
bndbox[option.tag] = int(float(option.text))
if bndbox['xmin'] is not None:
if object_name is None:
raise Exception('xml structure broken at bndbox tag')
if current_image_id is None:
raise Exception('xml structure broken at bndbox tag')
if current_category_id is None:
raise Exception('xml structure broken at bndbox tag')
bbox = []
bbox.append(bndbox['xmin'])
bbox.append(bndbox['ymin'])
bbox.append(bndbox['xmax'] - bndbox['xmin'])
bbox.append(bndbox['ymax'] - bndbox['ymin'])
addAnnoItem(object_name, current_image_id, current_category_id, bbox)
os.makedirs('./data/annotations')
xml_path = '/kaggle/input/dog-and-cat-detection/annotations'
json_file = './data/annotations/annotations_all.json'
parseXmlFiles(xml_path)
json.dump(coco, open(json_file, 'w'))
- Current working directory data storage file organization information:
- mmdetection
- data
- annotations
- annotations_all.json
- images
- Cats_Test0.png
- Cats_Test1.png
- Cats_Test2.png
- ....
- ...
- Since we need to use
mmyolo
a script in the project file to divide the data into training and test sets, first entermmyolo
the project folder
# 切换到mmyolo项目文件夹
%cd /kaggle/working/mmyolo
- The segmentation script file is located
tools/misc/coco_split.py
, and the parameters from top to bottom are: --json (generated.json
file path); --out-dir (generated segmentation.json
file storage folder path); --ratios 0.8 0.2 (training set, test set Proportion); –shuffle (whether to shuffle the order); –seed (random number seed)
# 切分训练、测试集
!python tools/misc/coco_split.py --json /kaggle/working/mmdetection/data/annotations/annotations_all.json \
--out-dir /kaggle/working/mmdetection/data/annotations \
--ratios 0.8 0.2 \
--shuffle \
--seed 2023
- output:
Split info: ======
Train ratio = 0.8, number = 2949
Val ratio = 0, number = 0
Test ratio = 0.2, number = 737
Set the global seed: 2023
shuffle dataset.
Saving json to /kaggle/working/mmdetection/data/annotations/trainval.json
Saving json to /kaggle/working/mmdetection/data/annotations/test.json
All done!
- Then switch back to
mmdetection
the project folder:
%cd /kaggle/working/mmdetection
- At this time, the working directory data storage file organization information:
- mmdetection
- data
- annotations
- test.json
- trainval.json
- annotations_all.json
- images
- Cats_Test0.png
- Cats_Test1.png
- Cats_Test2.png
- ....
- ...
Edit RTMDet model configuration
-
RTMDet
The model architecture diagram can be found in the documentation of the corresponding parameter folderREADME.md
.
-
You can
github
openconfigs/rtmdet/rtmdet_l_8xb32-300e_coco.py
the configuration file in (observe the _base_ value, if there is an inheritance relationship, you can search up until you find the main file), where theRTMDet-l
model model is already the main file and can be viewed directly. -
The main things we want to change are
_base_
(inherited superior files),data_root
(data storage folder),train_batch_size_per_gpu
(perGPU
trainingbatch size
),train_num_workers
(number of core jobs, generallyn GPU x 4
),max_epochs
(maximumepoch
number),base_lr
(basic learning rate),metainfo
(type information and the palette corresponding to each category),train_dataloader
(picture path and training set labeling information),val_dataloader
(picture path and validation set labeling information),val_evaluator
(validation set labeling information),model
(number of frozen backbone networksstages
, number of categories),param_scheduler
(learning rate decay trend),optim_wrapper
(learning rate assignment),default_hooks
(model weight preservation strategy), (custom_hooks
data pipeline switching), (load_from
pre-training weight loading path),train_cfg
(assignmentmax_epochs
and verification measurement),randomness
(fixed random number seed),visualizer
(select visualization platform ) -
The most important thing in the configuration file is
metainfo
the parameters andmodel
parameters. Be sure to check whether the number of categories is correct and the number of palettes is consistent. Note: Even if there is only 1 category,metainfo
it should be written as'classes': ('cat', ),
There must be commas in parentheses, otherwise an error will be reported.model
inbbox_head
should also be consistent with the number of species. -
Learning rate scaling generally follows a rule of thumb:
base_lr_default * (your_bs / default_bs)
. From the structure diagram above, we can seeRTMDet
that there are 4 modelsstages
, andmodel
the configurationdict(backbone=dict(frozen_stages=4), bbox_head=dict(num_classes=2))
indicates that 4 models are frozenstages
, that is, the backbone network is completely frozen.
config_animals = """
# Inherit and overwrite part of the config based on this config
_base_ = './rtmdet_l_8xb32-300e_coco.py'
data_root = './data/' # dataset root
train_batch_size_per_gpu = 24
train_num_workers = 4
max_epochs = 50
stage2_num_epochs = 6
base_lr = 0.000375
metainfo = {
'classes': ('cat', 'dog', ),
'palette': [
(252, 215, 99), (153, 197, 252),
]
}
train_dataloader = dict(
batch_size=train_batch_size_per_gpu,
num_workers=train_num_workers,
dataset=dict(
data_root=data_root,
metainfo=metainfo,
data_prefix=dict(img='images/'),
ann_file='annotations/trainval.json'))
val_dataloader = dict(
batch_size=train_batch_size_per_gpu,
num_workers=train_num_workers,
dataset=dict(
data_root=data_root,
metainfo=metainfo,
data_prefix=dict(img='images/'),
ann_file='annotations/trainval.json'))
test_dataloader = val_dataloader
val_evaluator = dict(ann_file=data_root + 'annotations/trainval.json')
test_evaluator = val_evaluator
model = dict(backbone=dict(frozen_stages=4), bbox_head=dict(num_classes=2))
# learning rate
param_scheduler = [
dict(
type='LinearLR',
start_factor=1.0e-5,
by_epoch=False,
begin=0,
end=1000),
dict(
# use cosine lr from 10 to 20 epoch
type='CosineAnnealingLR',
eta_min=base_lr * 0.05,
begin=max_epochs // 2,
end=max_epochs,
T_max=max_epochs // 2,
by_epoch=True,
convert_to_iter_based=True),
]
train_pipeline_stage2 = [
dict(type='LoadImageFromFile', backend_args=None),
dict(type='LoadAnnotations', with_bbox=True),
dict(
type='RandomResize',
scale=(640, 640),
ratio_range=(0.1, 2.0),
keep_ratio=True),
dict(type='RandomCrop', crop_size=(640, 640)),
dict(type='YOLOXHSVRandomAug'),
dict(type='RandomFlip', prob=0.5),
dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))),
dict(type='PackDetInputs')
]
# optimizer
optim_wrapper = dict(
_delete_=True,
type='OptimWrapper',
optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05),
paramwise_cfg=dict(
norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True))
default_hooks = dict(
checkpoint=dict(
interval=5,
max_keep_ckpts=2, # only keep latest 2 checkpoints
save_best='auto'
),
logger=dict(type='LoggerHook', interval=20))
custom_hooks = [
dict(
type='PipelineSwitchHook',
switch_epoch=max_epochs - stage2_num_epochs,
switch_pipeline=train_pipeline_stage2)
]
# load COCO pre-trained weight
load_from = './checkpoints/rtmdet_l_8xb32-300e_coco_20220719_112030-5a0be7c4.pth'
train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=max_epochs, val_begin=20, val_interval=1)
randomness = dict(seed=2023, deterministic=True, diff_rank_seed=False)
visualizer = dict(vis_backends=[dict(type='LocalVisBackend'), dict(type='WandbVisBackend')])
"""
with open('./configs/rtmdet/rtmdet_l_1xb4-100e_animals.py', 'w') as f:
f.write(config_animals)
model training
- After doing the above work, you can start model training
!python tools/train.py configs/rtmdet/rtmdet_l_1xb4-100e_animals.py
- model
epoch = 50
-time precision
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.952
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.995
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.800
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.919
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.959
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.964
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.965
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.965
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.800
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.939
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.970
07/10 07:35:26 - mmengine - INFO - bbox_mAP_copypaste: 0.952 1.000 0.995 0.800 0.919 0.959
07/10 07:35:27 - mmengine - INFO - Epoch(val) [50][123/123] coco/bbox_mAP: 0.9520 coco/bbox_mAP_50: 1.0000 coco/bbox_mAP_75: 0.9950 coco/bbox_mAP_s: 0.8000 coco/bbox_mAP_m: 0.9190 coco/bbox_mAP_l: 0.9590 data_time: 0.0532 time: 0.8068
- We can open
wandb
the platform, track training accuracy, and visualize various indicators
model reasoning
- After we fine-tune the model, we can reason on the picture
from mmdet.apis import DetInferencer
import glob
config = 'configs/rtmdet/rtmdet_l_1xb4-100e_animals.py'
checkpoint = glob.glob('./work_dirs/rtmdet_l_1xb4-100e_animals/best_coco*.pth')[0]
device = 'cuda:0'
inferencer = DetInferencer(config, checkpoint, device)
img = './data/images/Cats_Test1011.png'
result = inferencer(img, out_dir='./output', pred_score_thr=0.6)
display.clear_output()
Image.open('./output/vis/Cats_Test1011.png')
img = './data/images/Cats_Test1035.png'
result = inferencer(img, out_dir='./output', pred_score_thr=0.6)
display.clear_output()
Image.open('./output/vis/Cats_Test1035.png')