mmdetection learning & training to test their own data sets

First, the machine environment

Joint Science and Technology Shang Dynasty and the Chinese University of Hong Kong open source deep learning target detection kit mmdetection source address

Ubuntu16.04

Cuda9.0 + cudnn7.5

Python3.6

GCC 7.2

Anaconda3

Second, the environment configuration

Official Configuration Tutorial (Tutorial recommending such conduct)

1. Create a virtual environment using conda

conda create -n mmdetection python=3.6
source activate mmdetection

Anaconda mirror source ready to use

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge 
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2/

# 设置搜索时显示通道地址
conda config --set show_channel_urls yes
安装pytorch需要添加pytorch源
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/

2. Install pytorch

pytorch installation directly to the official website of the command, but should otherwise not be used to remove the -c python pytorch Tsinghua source

For example, my cuda is 9.0, the command I used to
 

conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=9.0
#其中cudatoolkit只是一个cuda的工具包

3.clone warehouse

git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection

4. Run the command

python setup.py develop

Is recommended before running the following packaged (Although this command is automatically download the following package, but very slow)

mmdetection step is now complete. Demo.py can create files, test wave. (Models are stored in the Amazon cloud servers, downloading will be slower)

from mmdet.apis import init_detector, inference_detector, show_result
import mmcv

config_file = 'configs/faster_rcnn_x101_32x4d_fpn_1x.py'
checkpoint_file = 'checkpoints/faster_rcnn_x101_32x4d_fpn_2x_20181218-0ed58946.pth'

# build the model from a config file and a checkpoint file
model = init_detector(config_file, checkpoint_file, device='cuda:0')

# test a single image and show the results
img = 'test.jpg'  # or img = mmcv.imread(img), which will only load it once
result = inference_detector(model, img)
# visualize the results in a new window
show_result(img, result, model.CLASSES)
# or save the visualization results to image files
#show_result(img, result, model.CLASSES, out_file='result.jpg')

# test a video and show the results
'''
video = mmcv.VideoReader('video.mp4')
for frame in video:
    result = inference_detector(model, frame)
    show_result(frame, result, model.CLASSES, wait_time=1)

Three -VOC training and test data sets format

1. Data set storage location (if the server already has this data set, we recommend using soft links)

mmdetection
├── mmdet
├── tools
├── configs
├── data
│   ├── VOCdevkit
│   │   ├── VOC2007
│   │   │   ├── Annotations
│   │   │   ├── JPEGImages
│   │   │   ├── ImageSets
│   │   │   │   ├── Main
│   │   │   │   │   ├── test.txt
│   │   │   │   │   ├── train.txt
                    |—— trainval.txt

2 code modifications

①mmdet / dataset / voc.py, the category into which category your own data set.

②mmdet / core / evaluation / class_name.py, the inside of the voc_classes () to change the category of your own data set.

③ modify the config file, I use the following sample RetinaNet

# model settings
model = dict(
    type='RetinaNet',
    pretrained='open-mmlab://resnext101_32x4d',
    backbone=dict(
        type='ResNeXt',
        depth=101,
        groups=32,
        base_width=4,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        style='pytorch'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        start_level=1,
        add_extra_convs=True,
        num_outs=5),
    bbox_head=dict(
        type='RetinaHead',
        num_classes=21, #修改为自己数据集的类别数+1
        in_channels=256,
        stacked_convs=4,
        feat_channels=256,
        octave_base_scale=4,
        scales_per_octave=3,
        anchor_ratios=[0.5, 1.0, 2.0],
        anchor_strides=[8, 16, 32, 64, 128],
        target_means=[.0, .0, .0, .0],
        target_stds=[1.0, 1.0, 1.0, 1.0],
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0)))
# training and testing settings
train_cfg = dict(
    assigner=dict(
        type='MaxIoUAssigner',
        pos_iou_thr=0.5,
        neg_iou_thr=0.4,
        min_pos_iou=0,
        ignore_iof_thr=-1),
    allowed_border=-1,
    pos_weight=-1,
    debug=False)
test_cfg = dict(
    nms_pre=1000,
    min_bbox_size=0,
    score_thr=0.05,
    nms=dict(type='nms', iou_thr=0.5),
    max_per_img=100)
# dataset settings
dataset_type = 'VOCDataset'
data_root = 'data/VOCdevkit/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'VOC2007/ImageSets/Main/train.txt',
        img_prefix=data_root + 'VOC2007/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'VOC2007/ImageSets/Main/trainval.txt',
        img_prefix=data_root + 'VOC2007/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
        img_prefix=data_root + 'VOC2007/',
        pipeline=test_pipeline))
# optimizer
optimizer = dict(type='SGD', lr=0.005, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    step=[8, 11])
checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
# yapf:enable
# runtime settings
total_epochs = 32
device_ids = range(8)
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/retinanet_x101_32x4d_fpn_1x'
load_from = None
resume_from = None
workflow = [('train', 1)]

Four training

python tools/train.py configs/RetinaNet.py --gpus 1 

--gpus gpu use the number, the default is from the first card 0

Five test

Since only the file test.py coco data sets the eval, so the first calculation map generated by test.py pkl file, then eval_voc.py

python tools/test.py configs/RetinaNet.py work_dirs/latest.pth --out=eval/result.pkl

Use pkl computer file for each class ap

python tools/voc_eval.py eval/result.pkl configs/RetinaNet.py

Many use the command, refer to tutorial on GitHub

Recommended Reading

https://blog.csdn.net/marshallwu1/article/details/93331712

发布了49 篇原创文章 · 获赞 24 · 访问量 2万+

Guess you like

Origin blog.csdn.net/acsuperman/article/details/102664303