AI Practical Training Camp&MMDetection Installation and Configuration Guide
An introduction to MMDetection
MMDetection is a widely used detection toolbox, including target detection, instance segmentation, panoramic segmentation and other general detection directions, and supports 75+ mainstream and cutting-edge models, providing users with more than 440+ pre-trained models. It has extensive applications in academic research and industrial implementation. The main features of the framework are:
- Modular design
MMDetection decouples the detection framework into different module components. By combining different module components, users can easily build customized detection models. - Supports a variety of detection tasks
MMDetection supports a variety of detection tasks, including target detection, instance segmentation, panoramic segmentation, and semi-supervised target detection. In the future, we will focus on supporting multi-modal universal detection directions. - Fast speed:
Basic box and mask operations are implemented in GPU versions, and the training speed is faster than or equivalent to other code libraries. - High performance
MMDetection This algorithm library is derived from the code developed by the MMDet team, the champion team of the COCO 2018 target detection competition. We have continued to improve and improve it since then. The newly released RTMDet also achieves state-of-the-art results in real-time instance segmentation and rotating target detection tasks, while also achieving the best balance of parameter size and accuracy in target detection models.
Version iteration changes 2.0 - 3.0
Based on MMDetection V2.0, it is decoupled through more fine-grained modules. It has further disassembled abstractions such as data, data transformation, model, evaluation, and visualizer, and designed these interfaces in a unified manner. The unified data flow and fine-grained modules have greatly improved the task expansion performance. Based on the new training engine MMEngine and the basic computer vision library MMCV, it has been fully adapted. After reconstruction and optimization of each component of the model, the speed and accuracy of MMDetection have been comprehensively improved, reaching the optimal level among existing detection frameworks.
MMDetection Repo: MMDetection Repo
MMDetection official document link: https://mmdetection.readthedocs.io/en/latest/
2. Environmental testing and installation
First enter the following command in jupyter. Of course, you can also enter it in the terminal and remove the previous one! Just number. You can print out the machine information of your machine.
# Check nvcc version
!nvcc -V
# Check GCC version
!gcc --version
# 安装 mmengine 和 mmcv 依赖
# 为了防止后续版本变更导致的代码无法运行,我们暂时锁死版本
!pwd
%pip install -U "openmim"
!mim install "mmengine"
!mim install "mmcv"
# Install mmdetection
!rm -rf mmdetection
!git clone https://github.com/open-mmlab/mmdetection.git
%cd mmdetection
%pip install -e .
Use this code to print out environment information
from mmengine.utils import get_git_hash
from mmengine.utils.dl_utils import collect_env as collect_base_env
import mmdet
# 环境信息收集和打印
def collect_env():
"""Collect the information of the running environments."""
env_info = collect_base_env()
env_info['MMDetection'] = f'{
mmdet.__version__}+{
get_git_hash()[:7]}'
return env_info
if __name__ == '__main__':
for name, val in collect_env().items():
print(f'{
name}: {
val}')
3. Prepare the data set
First, go to our MMDetection directory and download the data set.
The prepared data will be in the format of coco
Use the following code to see our data, we only see 8 pictures
import os
import matplotlib.pyplot as plt
from PIL import Image
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
original_images = []
images = []
texts = []
plt.figure(figsize=(16, 5))
image_paths= [filename for filename in os.listdir('cat_dataset/images')][:8]
for i,filename in enumerate(image_paths):
name = os.path.splitext(filename)[0]
image = Image.open('cat_dataset/images/'+filename).convert("RGB")
plt.subplot(2, 4, i+1)
plt.imshow(image)
plt.title(f"{
filename}")
plt.xticks([])
plt.yticks([])
plt.tight_layout()
from pycocotools.coco import COCO
from PIL import Image
import numpy as np
import os.path as osp
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
from matplotlib.patches import Polygon
def apply_exif_orientation(image):
_EXIF_ORIENT = 274
if not hasattr(image, 'getexif'):
return image
try:
exif = image.getexif()
except Exception:
exif = None
if exif is None:
return image
orientation = exif.get(_EXIF_ORIENT)
method = {
2: Image.FLIP_LEFT_RIGHT,
3: Image.ROTATE_180,
4: Image.FLIP_TOP_BOTTOM,
5: Image.TRANSPOSE,
6: Image.ROTATE_270,
7: Image.TRANSVERSE,
8: Image.ROTATE_90,
}.get(orientation)
if method is not None:
return image.transpose(method)
return image
def show_bbox_only(coco, anns, show_label_bbox=True, is_filling=True):
"""Show bounding box of annotations Only."""
if len(anns) == 0:
return
ax = plt.gca()
ax.set_autoscale_on(False)
image2color = dict()
for cat in coco.getCatIds():
image2color[cat] = (np.random.random((1, 3)) * 0.7 + 0.3).tolist()[0]
polygons = []
colors = []
for ann in anns:
color = image2color[ann['category_id']]
bbox_x, bbox_y, bbox_w, bbox_h = ann['bbox']
poly = [[bbox_x, bbox_y], [bbox_x, bbox_y + bbox_h],
[bbox_x + bbox_w, bbox_y + bbox_h], [bbox_x + bbox_w, bbox_y]]
polygons.append(Polygon(np.array(poly).reshape((4, 2))))
colors.append(color)
if show_label_bbox:
label_bbox = dict(facecolor=color)
else:
label_bbox = None
ax.text(
bbox_x,
bbox_y,
'%s' % (coco.loadCats(ann['category_id'])[0]['name']),
color='white',
bbox=label_bbox)
if is_filling:
p = PatchCollection(
polygons, facecolor=colors, linewidths=0, alpha=0.4)
ax.add_collection(p)
p = PatchCollection(
polygons, facecolor='none', edgecolors=colors, linewidths=2)
ax.add_collection(p)
coco = COCO('/gemini/code/mmdetection/cat_dataset/annotations/test.json')
image_ids = coco.getImgIds()
np.random.shuffle(image_ids)
plt.figure(figsize=(16, 5))
# 只可视化 8 张图片
for i in range(8):
image_data = coco.loadImgs(image_ids[i])[0]
image_path = osp.join('/gemini/code/mmdetection/cat_dataset/images/',image_data['file_name'])
annotation_ids = coco.getAnnIds(
imgIds=image_data['id'], catIds=[], iscrowd=0)
annotations = coco.loadAnns(annotation_ids)
ax = plt.subplot(2, 4, i+1)
image = Image.open(image_path).convert("RGB")
# 这行代码很关键,否则可能图片和标签对不上
image=apply_exif_orientation(image)
ax.imshow(image)
show_bbox_only(coco, annotations)
plt.title(f"{
filename}")
plt.xticks([])
plt.yticks([])
plt.tight_layout()
Four custom configuration files
This tutorial uses RTMDet for demonstration. Before starting to customize the configuration file, let's first understand the RTMDet algorithm.
The model architecture diagram is shown above. RTMDet is a high-performance and low-latency detection algorithm that has currently implemented target detection, instance segmentation and rotating frame detection tasks. Its brief description is: To obtain a more efficient model architecture, MMDetection explores an architecture with backbone and Neck-compatible capacity, consisting of a basic building block containing large-core deep convolutions. MMDetection further introduces soft labels when calculating matching costs in dynamic label allocation to improve accuracy. Combined with better training techniques, the resulting object detector, named RTMDet, achieves 52.8% COCO AP at over 300 FPS on NVIDIA 3090 GPU, which is better than the current mainstream industrial detectors. RTMDet achieves optimal parameter-accuracy trade-offs in small/medium/large/extra-large model sizes for a variety of application scenarios, and achieves new state-of-the-art performance in real-time instance segmentation and rotated object detection.
cat is a single-class data set, and MMDetection provides COCO 80 class configuration, so we need to modify some important parameters through configuration.
Several issues need to be noted:
- The most important thing in the custom data set is the metainfo field. Users must remember to pass it to the dataset after the configuration is completed, otherwise it will not take effect (some users like to directly modify the coco.py source code when customizing the data set. This is strongly not recommended. The correct way is to configure metainfo and pass it to dataset)
- If the user metainfo configuration is incorrect, several situations usually occur: (1) num_classes mismatch error occurs (2) loss_bbox is always 0 (3) Typical situations such as the evaluation result after training is empty.
- Most of the learning rates provided by MMDetection are based on 8 cards. If your total bs is different, you must remember to scale the learning rate, otherwise some algorithms are prone to NAN. For details, please refer to https://mmdetection.readthedocs.io/zh_CN/latest /user_guides/train.html#id3
First we create the configuration file that needs to be written under the cat_data folder (I usually like this place)
After the configuration file is written, we can use the following py code to check it:
from mmdet.registry import DATASETS, VISUALIZERS
from mmengine.config import Config
from mmengine.registry import init_default_scope
import matplotlib.pyplot as plt
import os.path as osp
cfg = Config.fromfile('/gemini/code/mmdetection/cat_dataset/config_coco.py')
init_default_scope(cfg.get('default_scope', 'mmdet'))
dataset = DATASETS.build(cfg.train_dataloader.dataset)
visualizer = VISUALIZERS.build(cfg.visualizer)
visualizer.dataset_meta = dataset.metainfo
plt.figure(figsize=(16, 5))
# 只可视化前 8 张图片
for i in range(8):
item=dataset[i]
img = item['inputs'].permute(1, 2, 0).numpy()
data_sample = item['data_samples'].numpy()
gt_instances = data_sample.gt_instances
img_path = osp.basename(item['data_samples'].img_path)
gt_bboxes = gt_instances.get('bboxes', None)
gt_instances.bboxes = gt_bboxes.tensor
data_sample.gt_instances = gt_instances
visualizer.add_datasample(
osp.basename(img_path),
img,
data_sample,
draw_pred=False,
show=False)
drawed_image=visualizer.get_image()
plt.subplot(2, 4, i+1)
plt.imshow(drawed_image[..., [2, 1, 0]])
plt.title(f"{
osp.basename(img_path)}")
plt.xticks([])
plt.yticks([])
plt.tight_layout()
If the above information is displayed, there is no problem with the configuration file.
Now you can start running
python3 tools/train.py cat_dataset/config_coco.py