Train your own dataset with MMDetection

github link: OpenMMLab (github.com)

Official documentation: Prerequisites — MMDetection 2.15.1 documentation

MMDetection recommends that it is best to use it under the linux system. It is true that there are too many bugs in the use of the windows system.

The following tutorial will teach you how to use MMDetection to train a target detection model of your own. The design of MMDetection is very nice. After preparing the data, you only need to modify the configuration file slightly to complete the training. The configuration files of most models are in MMDetection All are provided, just inherit these configuration files and rewrite some of the parameters.

Install MMDetection

First, check your version of nvcc and gcc by the following command, where nvcc is the key to call gpu and gcc is the key to compile the code.

# Check nvcc version
!nvcc -V
# Check GCC version
!gcc --version

Your computer will output the following information:

image-20210827194213135

Then you need to install mmdetection. mmdetection is a computer vision target detection component provided by openmmlab. It also provides a variety of computer vision component libraries such as semantic segmentation and classification. These component libraries basically depend on mmcv. Be sure to pay attention when installing Keep the versions of mmcv and the component library match. For example, the following figure shows the matching relationship between mmcv and mmdetection.

MMDetection version MMCV version
master mmcv-full>=1.3.8, <1.4.0
2.15.1 mmcv-full>=1.3.8, <1.4.0
2.15.0 mmcv-full>=1.3.8, <1.4.0
2.14.0 mmcv-full>=1.3.8, <1.4.0
2.13.0 mmcv-full>=1.3.3, <1.4.0
2.12.0 mmcv-full>=1.3.3, <1.4.0
2.11.0 mmcv-full>=1.2.4, <1.4.0
2.10.0 mmcv-full>=1.2.4, <1.4.0
2.9.0 mmcv-full>=1.2.4, <1.4.0
2.8.0 mmcv-full>=1.2.4, <1.4.0
2.7.0 mmcv-full>=1.1.5, <1.4.0
2.6.0 mmcv-full>=1.1.5, <1.4.0
2.5.0 mmcv-full>=1.1.5, <1.4.0
2.4.0 mmcv-full>=1.1.1, <1.4.0
2.3.0 mmcv-full==1.0.5
2.3.0rc0 mmcv-full>=1.0.2
2.2.1 mmcv==0.6.2
2.2.0 mmcv==0.6.2
2.1.0 mmcv>=0.5.9, <=0.6.1
2.0.0 mmcv>=0.5.1, <=0.5.8

If you are in a jupyter environment, you can execute the following command to complete the installation.

# install dependencies: (use cu101 because colab has CUDA 10.1)
!pip install -U torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html

# install mmcv-full thus we could use CUDA operators
!pip install mmcv-full

# Install mmdetection
!rm -rf mmdetection
!git clone https://github.com/open-mmlab/mmdetection.git
%cd mmdetection

!pip install -e .

# install Pillow 7.0.0 back in order to avoid bug in colab
!pip install Pillow==7.0.0

And execute the following python code to check if the installation is successful.

# Check Pytorch installation
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())

# Check MMDetection installation
import mmdet
print(mmdet.__version__)

# Check mmcv installation
from mmcv.ops import get_compiling_cuda_version, get_compiler_version
print(get_compiling_cuda_version())
print(get_compiler_version())

If the installation is successful, the following information will be output to your command line.

image-20210827194727581

Or you can use the maskrnn model provided by him through the following code.

!mkdir checkpoints
!wget -c https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth \
      -O checkpoints/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth
from mmdet.apis import inference_detector, init_detector, show_result_pyplot

# Choose to use a config and initialize the detector
config = 'configs/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco.py'
# Setup a checkpoint file to load
checkpoint = 'checkpoints/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth'
# initialize the detector
model = init_detector(config, checkpoint, device='cuda:0')
# Use the detector to do inference
img = 'demo/demo.jpg'
result = inference_detector(model, img)
# Let's plot the result
show_result_pyplot(model, img, result, score_thr=0.3)

The effect is as follows:

image-20210827194906755

Prepare data

Official documentation: Tutorial 2: Customize Datasets — MMDetection 2.15.1 documentation

Most of the target detection data needs to be processed into the format of voc or coco, where the format of voc is an xml file, bbox is the coordinates of the upper left and lower right corners, coco is a json file, and bbox is the coordinates and width and height of the upper left corner. Below we will use a small-scale kitti dataset as the dataset we use. The download address is as follows:

# download, decompress the data
!wget https://download.openmmlab.com/mmdetection/data/kitti_tiny.zip
!unzip kitti_tiny.zip > /dev/null

The format of the dataset is as follows:

# Check the directory structure of the tiny data
# Install tree first
!apt-get -q install tree
!tree kitti_tiny

# 数据集格式 images目录是是图片,labels目录下是标签,train和val分别记录了训练和验证使用到的数据
kitti_tiny
├── training
│   ├── image_2
│   │   ├── 000000.jpeg
│   │   ├── 000001.jpeg
│   │   ├── 000002.jpeg
│   │   ├── 000003.jpeg
│   └── label_2
│       ├── 000000.txt
│       ├── 000001.txt
│       ├── 000002.txt
│       ├── 000003.txt
├── train.txt
└── val.txt

You can use the following code to see what the picture roughly looks like

# Let's take a look at the dataset image
import mmcv
import matplotlib.pyplot as plt

img = mmcv.imread('kitti_tiny/training/image_2/000073.jpeg')
plt.figure(figsize=(15, 10))
plt.imshow(mmcv.bgr2rgb(img))
plt.show()

image-20210827195704071

Train the model

After preparing the data, we only need to modify our configuration file to complete the training:

First, you need to load the basic configuration files. configsYou can find these configuration files in the directory. For example, here we load the configuration file of faster_rcnn.

from mmcv import Config
cfg = Config.fromfile('./configs/faster_rcnn/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco.py')

Modify and save the modified configuration file. We can directly load our configuration file when inferring later.

from mmdet.apis import set_random_seed

# Modify dataset type and path
cfg.dataset_type = 'KittiTinyDataset'
cfg.data_root = 'kitti_tiny/'

cfg.data.test.type = 'KittiTinyDataset'
cfg.data.test.data_root = 'kitti_tiny/'
cfg.data.test.ann_file = 'train.txt'
cfg.data.test.img_prefix = 'training/image_2'

cfg.data.train.type = 'KittiTinyDataset'
cfg.data.train.data_root = 'kitti_tiny/'
cfg.data.train.ann_file = 'train.txt'
cfg.data.train.img_prefix = 'training/image_2'

cfg.data.val.type = 'KittiTinyDataset'
cfg.data.val.data_root = 'kitti_tiny/'
cfg.data.val.ann_file = 'val.txt'
cfg.data.val.img_prefix = 'training/image_2'

# modify num classes of the model in box head
cfg.model.roi_head.bbox_head.num_classes = 3
# We can still use the pre-trained Mask RCNN model though we do not need to
# use the mask branch
cfg.load_from = 'checkpoints/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth'

# Set up working dir to save files and logs.
cfg.work_dir = './tutorial_exps'

# The original learning rate (LR) is set for 8-GPU training.
# We divide it by 8 since we only use one GPU.
cfg.optimizer.lr = 0.02 / 8
cfg.lr_config.warmup = None
cfg.log_config.interval = 10

# Change the evaluation metric since we use customized dataset.
cfg.evaluation.metric = 'mAP'
# We can set the evaluation interval to reduce the evaluation times
cfg.evaluation.interval = 12
# We can set the checkpoint saving interval to reduce the storage cost
cfg.checkpoint_config.interval = 12

# Set seed thus the results are more reproducible
cfg.seed = 0
set_random_seed(0, deterministic=False)
cfg.gpu_ids = range(1)


# We can initialize the logger for training and have a look
# at the final config used for training
print(f'Config:\n{
      
      cfg.pretty_text}')
# 保存模型的各种参数(一定要记得嗷)
cfg.dump(F'{
      
      cfg.work_dir}/customformat_kitti.py')

Then you can train

from mmdet.datasets import build_dataset
from mmdet.models import build_detector
from mmdet.apis import train_detector


# Build dataset
datasets = [build_dataset(cfg.data.train)]

# Build the detector
model = build_detector(
    cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))
# Add an attribute for visualization convenience
model.CLASSES = datasets[0].CLASSES

# Create work_dir
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
train_detector(model, datasets, cfg, distributed=False, validate=True)

After a long training, you will get the following training records and generate log files

image-20210827200237662

? How to get information from training logs

From the logs, we can have a basic understanding of the training process and know how well the detector is trained.

First, load the ResNet-50 backbone pretrained on ImageNet, which is a common practice since it is more expensive to train from scratch. The log shows that all weights of the ResNet-50 backbone are loaded except conv1.bias, which is merged into conv.weights.

Second, since the dataset we used is small, we loaded a Mask R-CNN model and fine-tuned it for detection. Because the detector we are actually using is Faster R-CNN, the weights in the mask branch, eg roi_head.mask_head, are unexpected keys in the source state_dict, not loaded. The original Mask R-CNN is trained on the COCO dataset which contains 80 classes, but the KITTI Tiny dataset has only 3 classes. Therefore, the last FC layer of the pretrained Mask R-CNN for classification has a different weight shape and is not used.

Third, after training, the detector is evaluated by default VOC-style evaluation. The results show that the detector achieves 54.1 mAP on the val dataset, not bad!

Use the trained model

If you are the jupyter code, you can continue to execute the following files to use the trained model.

img = mmcv.imread('kitti_tiny/training/image_2/000068.jpeg')

model.cfg = cfg
result = inference_detector(model, img)
show_result_pyplot(model, img, result)

If you are developing in tools such as pycharm, you can refer to this blog to use your model.

Target detection using MMDetection - dejahu's blog - CSDN blog

Finally, attach the complete training code

from mmcv import Config
from mmdet.datasets import build_dataset
from mmdet.models import build_detector
from mmdet.apis import train_detector
from mmdet.apis import set_random_seed
import os.path as osp
import mmcv
import numpy as np
from mmdet.datasets.builder import DATASETS
from mmdet.datasets.custom import CustomDataset
import warnings
warnings.filterwarnings('ignore')

@DATASETS.register_module()
class KittiTinyDataset(CustomDataset):
    CLASSES = ('Car', 'Pedestrian', 'Cyclist')
    def load_annotations(self, ann_file):
        cat2label = {
    
    k: i for i, k in enumerate(self.CLASSES)}
        # load image list from file
        image_list = mmcv.list_from_file(self.ann_file)

        data_infos = []
        # convert annotations to middle format
        for image_id in image_list:
            filename = f'{
      
      self.img_prefix}/{
      
      image_id}.jpeg'
            image = mmcv.imread(filename)
            height, width = image.shape[:2]

            data_info = dict(filename=f'{
      
      image_id}.jpeg', width=width, height=height)

            # load annotations
            label_prefix = self.img_prefix.replace('image_2', 'label_2')
            lines = mmcv.list_from_file(osp.join(label_prefix, f'{
      
      image_id}.txt'))

            content = [line.strip().split(' ') for line in lines]
            bbox_names = [x[0] for x in content]
            bboxes = [[float(info) for info in x[4:8]] for x in content]

            gt_bboxes = []
            gt_labels = []
            gt_bboxes_ignore = []
            gt_labels_ignore = []

            # filter 'DontCare'
            for bbox_name, bbox in zip(bbox_names, bboxes):
                if bbox_name in cat2label:
                    gt_labels.append(cat2label[bbox_name])
                    gt_bboxes.append(bbox)
                else:
                    gt_labels_ignore.append(-1)
                    gt_bboxes_ignore.append(bbox)

            data_anno = dict(
                bboxes=np.array(gt_bboxes, dtype=np.float32).reshape(-1, 4),
                labels=np.array(gt_labels, dtype=np.long),
                bboxes_ignore=np.array(gt_bboxes_ignore,
                                       dtype=np.float32).reshape(-1, 4),
                labels_ignore=np.array(gt_labels_ignore, dtype=np.long))

            data_info.update(ann=data_anno)
            data_infos.append(data_info)

        return data_infos

cfg = Config.fromfile('./configs/faster_rcnn/faster_rcnn_r50_caffe_fpn_mstrain_1x_coco.py')
# Modify dataset type and path
cfg.dataset_type = 'KittiTinyDataset'
cfg.data_root = 'data/kitti_tiny/'
cfg.data.test.type = 'KittiTinyDataset'
cfg.data.test.data_root = 'data/kitti_tiny/'
cfg.data.test.ann_file = 'train.txt'
cfg.data.test.img_prefix = 'training/image_2'
cfg.data.train.type = 'KittiTinyDataset'
cfg.data.train.data_root = 'data/kitti_tiny/'
cfg.data.train.ann_file = 'train.txt'
cfg.data.train.img_prefix = 'training/image_2'
cfg.data.val.type = 'KittiTinyDataset'
cfg.data.val.data_root = 'data/kitti_tiny/'
cfg.data.val.ann_file = 'val.txt'
cfg.data.val.img_prefix = 'training/image_2'
# modify num classes of the model in box head
cfg.model.roi_head.bbox_head.num_classes = 3
# We can still use the pre-trained Mask RCNN model though we do not need to
# use the mask branch
cfg.load_from = 'checkpoints/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth'
# Set up working dir to save files and logs.
cfg.work_dir = './tutorial_exps'
# The original learning rate (LR) is set for 8-GPU training.
# We divide it by 8 since we only use one GPU.
cfg.optimizer.lr = 0.02 / 8
cfg.lr_config.warmup = None
cfg.log_config.interval = 10
# Change the evaluation metric since we use customized dataset.
cfg.evaluation.metric = 'mAP'
# We can set the evaluation interval to reduce the evaluation times
cfg.evaluation.interval = 12
# We can set the checkpoint saving interval to reduce the storage cost
cfg.checkpoint_config.interval = 12
# Set seed thus the results are more reproducible
cfg.seed = 0
set_random_seed(0, deterministic=False)
cfg.gpu_ids = range(1)
# We can initialize the logger for training and have a look
# at the final config used for training
print(f'Config:\n{
      
      cfg.pretty_text}')
# 保存模型的各种参数(一定要记得嗷)
cfg.dump(F'{
      
      cfg.work_dir}/customformat_kitti.py')

# 训练主要进程
# Build dataset
datasets = [build_dataset(cfg.data.train)]

# Build the detector
model = build_detector(
    cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))
# Add an attribute for visualization convenience
model.CLASSES = datasets[0].CLASSES

# Create work_dir
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
train_detector(model, datasets, cfg, distributed=False, validate=True)

Guess you like

Origin blog.csdn.net/ECHOSON/article/details/119959863