AI Practical Training Camp&MMDetection Installation and Configuration Guide

An introduction to MMDetection

Insert image description hereMMDetection is a widely used detection toolbox, including target detection, instance segmentation, panoramic segmentation and other general detection directions, and supports 75+ mainstream and cutting-edge models, providing users with more than 440+ pre-trained models. It has extensive applications in academic research and industrial implementation. The main features of the framework are:

  • Modular design
    MMDetection decouples the detection framework into different module components. By combining different module components, users can easily build customized detection models.
  • Supports a variety of detection tasks
    MMDetection supports a variety of detection tasks, including target detection, instance segmentation, panoramic segmentation, and semi-supervised target detection. In the future, we will focus on supporting multi-modal universal detection directions.
  • Fast speed:
    Basic box and mask operations are implemented in GPU versions, and the training speed is faster than or equivalent to other code libraries.
  • High performance
    MMDetection This algorithm library is derived from the code developed by the MMDet team, the champion team of the COCO 2018 target detection competition. We have continued to improve and improve it since then. The newly released RTMDet also achieves state-of-the-art results in real-time instance segmentation and rotating target detection tasks, while also achieving the best balance of parameter size and accuracy in target detection models.

Version iteration changes 2.0 - 3.0

Based on MMDetection V2.0, it is decoupled through more fine-grained modules. It has further disassembled abstractions such as data, data transformation, model, evaluation, and visualizer, and designed these interfaces in a unified manner. The unified data flow and fine-grained modules have greatly improved the task expansion performance. Based on the new training engine MMEngine and the basic computer vision library MMCV, it has been fully adapted. After reconstruction and optimization of each component of the model, the speed and accuracy of MMDetection have been comprehensively improved, reaching the optimal level among existing detection frameworks.

MMDetection Repo: MMDetection Repo
MMDetection official document link: https://mmdetection.readthedocs.io/en/latest/

2. Environmental testing and installation

First enter the following command in jupyter. Of course, you can also enter it in the terminal and remove the previous one! Just number. You can print out the machine information of your machine.

# Check nvcc version
!nvcc -V
# Check GCC version
!gcc --version

Please add image description

# 安装 mmengine 和 mmcv 依赖
# 为了防止后续版本变更导致的代码无法运行,我们暂时锁死版本
!pwd
%pip install -U "openmim"
!mim install "mmengine"
!mim install "mmcv"
# Install mmdetection
!rm -rf mmdetection
!git clone https://github.com/open-mmlab/mmdetection.git
%cd mmdetection
%pip install -e .

Use this code to print out environment information

from mmengine.utils import get_git_hash
from mmengine.utils.dl_utils import collect_env as collect_base_env

import mmdet

# 环境信息收集和打印
def collect_env():
    """Collect the information of the running environments."""
    env_info = collect_base_env()
    env_info['MMDetection'] = f'{
      
      mmdet.__version__}+{
      
      get_git_hash()[:7]}'
    return env_info


if __name__ == '__main__':
    for name, val in collect_env().items():
        print(f'{
      
      name}: {
      
      val}')

Please add image description

3. Prepare the data set

First, go to our MMDetection directory and download the data set.

Please add image description
The prepared data will be in the format of coco

Please add image description

Use the following code to see our data, we only see 8 pictures

import os
import matplotlib.pyplot as plt
from PIL import Image

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

original_images = []
images = []
texts = []
plt.figure(figsize=(16, 5))

image_paths= [filename for filename in os.listdir('cat_dataset/images')][:8]

for i,filename in enumerate(image_paths):
    name = os.path.splitext(filename)[0]

    image = Image.open('cat_dataset/images/'+filename).convert("RGB")
  
    plt.subplot(2, 4, i+1)
    plt.imshow(image)
    plt.title(f"{
      
      filename}")
    plt.xticks([])
    plt.yticks([])

plt.tight_layout()

Please add image description

from pycocotools.coco import COCO
from PIL import Image
import numpy as np
import os.path as osp
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
from matplotlib.patches import Polygon

def apply_exif_orientation(image):
    _EXIF_ORIENT = 274
    if not hasattr(image, 'getexif'):
        return image

    try:
        exif = image.getexif()
    except Exception:
        exif = None

    if exif is None:
        return image

    orientation = exif.get(_EXIF_ORIENT)

    method = {
    
    
        2: Image.FLIP_LEFT_RIGHT,
        3: Image.ROTATE_180,
        4: Image.FLIP_TOP_BOTTOM,
        5: Image.TRANSPOSE,
        6: Image.ROTATE_270,
        7: Image.TRANSVERSE,
        8: Image.ROTATE_90,
    }.get(orientation)
    if method is not None:
        return image.transpose(method)
    return image


def show_bbox_only(coco, anns, show_label_bbox=True, is_filling=True):
    """Show bounding box of annotations Only."""
    if len(anns) == 0:
        return

    ax = plt.gca()
    ax.set_autoscale_on(False)

    image2color = dict()
    for cat in coco.getCatIds():
        image2color[cat] = (np.random.random((1, 3)) * 0.7 + 0.3).tolist()[0]

    polygons = []
    colors = []

    for ann in anns:
        color = image2color[ann['category_id']]
        bbox_x, bbox_y, bbox_w, bbox_h = ann['bbox']
        poly = [[bbox_x, bbox_y], [bbox_x, bbox_y + bbox_h],
                [bbox_x + bbox_w, bbox_y + bbox_h], [bbox_x + bbox_w, bbox_y]]
        polygons.append(Polygon(np.array(poly).reshape((4, 2))))
        colors.append(color)

        if show_label_bbox:
            label_bbox = dict(facecolor=color)
        else:
            label_bbox = None

        ax.text(
            bbox_x,
            bbox_y,
            '%s' % (coco.loadCats(ann['category_id'])[0]['name']),
            color='white',
            bbox=label_bbox)

    if is_filling:
        p = PatchCollection(
            polygons, facecolor=colors, linewidths=0, alpha=0.4)
        ax.add_collection(p)
    p = PatchCollection(
        polygons, facecolor='none', edgecolors=colors, linewidths=2)
    ax.add_collection(p)

    
coco = COCO('/gemini/code/mmdetection/cat_dataset/annotations/test.json')
image_ids = coco.getImgIds()
np.random.shuffle(image_ids)

plt.figure(figsize=(16, 5))

# 只可视化 8 张图片
for i in range(8):
    image_data = coco.loadImgs(image_ids[i])[0]
    image_path = osp.join('/gemini/code/mmdetection/cat_dataset/images/',image_data['file_name'])
    annotation_ids = coco.getAnnIds(
            imgIds=image_data['id'], catIds=[], iscrowd=0)
    annotations = coco.loadAnns(annotation_ids)
    
    ax = plt.subplot(2, 4, i+1)
    image = Image.open(image_path).convert("RGB")
    
    # 这行代码很关键,否则可能图片和标签对不上
    image=apply_exif_orientation(image)
    
    ax.imshow(image)
    
    show_bbox_only(coco, annotations)
    
    plt.title(f"{
      
      filename}")
    plt.xticks([])
    plt.yticks([])
        
plt.tight_layout()    

Please add image description

Four custom configuration files

This tutorial uses RTMDet for demonstration. Before starting to customize the configuration file, let's first understand the RTMDet algorithm.

Insert image description here
The model architecture diagram is shown above. RTMDet is a high-performance and low-latency detection algorithm that has currently implemented target detection, instance segmentation and rotating frame detection tasks. Its brief description is: To obtain a more efficient model architecture, MMDetection explores an architecture with backbone and Neck-compatible capacity, consisting of a basic building block containing large-core deep convolutions. MMDetection further introduces soft labels when calculating matching costs in dynamic label allocation to improve accuracy. Combined with better training techniques, the resulting object detector, named RTMDet, achieves 52.8% COCO AP at over 300 FPS on NVIDIA 3090 GPU, which is better than the current mainstream industrial detectors. RTMDet achieves optimal parameter-accuracy trade-offs in small/medium/large/extra-large model sizes for a variety of application scenarios, and achieves new state-of-the-art performance in real-time instance segmentation and rotated object detection.

cat is a single-class data set, and MMDetection provides COCO 80 class configuration, so we need to modify some important parameters through configuration.
Several issues need to be noted:

  • The most important thing in the custom data set is the metainfo field. Users must remember to pass it to the dataset after the configuration is completed, otherwise it will not take effect (some users like to directly modify the coco.py source code when customizing the data set. This is strongly not recommended. The correct way is to configure metainfo and pass it to dataset)
  • If the user metainfo configuration is incorrect, several situations usually occur: (1) num_classes mismatch error occurs (2) loss_bbox is always 0 (3) Typical situations such as the evaluation result after training is empty.
  • Most of the learning rates provided by MMDetection are based on 8 cards. If your total bs is different, you must remember to scale the learning rate, otherwise some algorithms are prone to NAN. For details, please refer to https://mmdetection.readthedocs.io/zh_CN/latest /user_guides/train.html#id3
    First we create the configuration file that needs to be written under the cat_data folder (I usually like this place)

After the configuration file is written, we can use the following py code to check it:

from mmdet.registry import DATASETS, VISUALIZERS
from mmengine.config import Config
from mmengine.registry import init_default_scope
import matplotlib.pyplot as plt
import os.path as osp

cfg = Config.fromfile('/gemini/code/mmdetection/cat_dataset/config_coco.py')

init_default_scope(cfg.get('default_scope', 'mmdet'))

dataset = DATASETS.build(cfg.train_dataloader.dataset)
visualizer = VISUALIZERS.build(cfg.visualizer)
visualizer.dataset_meta = dataset.metainfo

plt.figure(figsize=(16, 5))

# 只可视化前 8 张图片
for i in range(8):
   item=dataset[i]

   img = item['inputs'].permute(1, 2, 0).numpy()
   data_sample = item['data_samples'].numpy()
   gt_instances = data_sample.gt_instances
   img_path = osp.basename(item['data_samples'].img_path)

   gt_bboxes = gt_instances.get('bboxes', None)
   gt_instances.bboxes = gt_bboxes.tensor
   data_sample.gt_instances = gt_instances

   visualizer.add_datasample(
            osp.basename(img_path),
            img,
            data_sample,
            draw_pred=False,
            show=False)
   drawed_image=visualizer.get_image()

   plt.subplot(2, 4, i+1)
   plt.imshow(drawed_image[..., [2, 1, 0]])
   plt.title(f"{
      
      osp.basename(img_path)}")
   plt.xticks([])
   plt.yticks([])
plt.tight_layout()    

Please add image description
If the above information is displayed, there is no problem with the configuration file.
Now you can start running

python3 tools/train.py cat_dataset/config_coco.py

Please add image description

Guess you like

Origin blog.csdn.net/shengweiit/article/details/131115911