1. Prepare VOC data
Download the VOC data and put the VOC data in the datasets folder. The two folders VOC2007 and VOC2012 are opened separately, and the structure of the folder does not need to be changed.
datasets
|----VOC2007
|----VOC2012
2. Data training
2.1 Training commands
Start training with the following command.
CUDA_VISIBLE_DEVICES=0 python3 train_net.py \
--config-file ../configs/PascalVOC-Detection/faster_rcnn_R_50_FPN.yaml \
--num-gpus 1 SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01
CUDA_VISIBLE_DEVICES can select the GPU to deploy training;
--config-file
select the network structure configuration;
--num-gpus
select the number of training GPUs to
SOLVER.IMS_PER_BATCH
set the training batch_size;
SOLVER.BASE_LR
set the training learning_rate;
2.2 config yaml file
The faster_rcnn_R_50_FPN.yaml
original settings of SOLVER in the configuration file are as follows. The maximum number of iterations is 18000, 17.4 epochs
which means that the entire data set has been iterated for 17.4 epochs. The VOC data set has a total of 16551 training images 18000*16/16551=17.4
. If the batch size is small, you can modify it. The number of iteration steps to ensure that the data set is trained for 17.4 epochs.
SOLVER:
STEPS: (12000, 16000)
MAX_ITER: 18000 # 17.4 epochs
WARMUP_ITERS: 100
3. Test results
3.1 Training evaluation
After the training is completed, the performance of the model will be evaluated, and the following results will be output.
[04/23 06:04:30] d2.evaluation.pascal_voc_evaluation INFO: Evaluating voc_2007_test using 2007 metric. Note that results do not use the official Matlab API.
[04/23 06:05:08] d2.engine.defaults INFO: Evaluation results for voc_2007_test in csv format:
[04/23 06:05:08] d2.evaluation.testing INFO: copypaste: Task: bbox
[04/23 06:05:08] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75
[04/23 06:05:08] d2.evaluation.testing INFO: copypaste: 51.3760,80.6508,55.5273
[04/23 06:05:08] d2.utils.events INFO: eta: 0:00:01 iter: 17999 total_loss: 0.283 loss_cls: 0.109 loss_box_reg: 0.154 loss_rpn_cls: 0.008 loss_rpn_loc: 0.017 time: 1.1067 data_time: 0.0382 lr: 0.000100 max_mem: 9423M
The results show that the model has an AP of 0.5:0.95 on the VOC2007 test data set of 51.38, an AP of 0.5 of 80.65, and an AP of 0.75 of 55.53.
3.2 benchmark
From the MODEL ZOO of detectron2 , you can see the baseline of Faster RCNN on the Cityscapes and Pascal VOC Baselines data sets, and you can see that the mAP of the training result is basically the same as the baseline.
Name | train time (s/iter) |
inference time (s/im) |
train mem (GB) |
box AP |
box AP50 |
AP mask |
model id | download |
---|---|---|---|---|---|---|---|---|
R50-FPN, Cityscapes | 0.240 | 0.078 | 4.4 | 36.5 | 142423278 | model | metrics | ||
R50-C4, VOC | 0.537 | 0.081 | 4.8 | 51.9 | 80.3 | 142202221 | model | metrics |
3.3 Inference
The detection of a single picture on the picture can be achieved by the following command
python3 demo.py --config-file ../configs/PascalVOC-Detection/faster_rcnn_R_50_FPN.yaml \
--input cat.jpg \
--output result_cat_voc.jpg \
--opts MODEL.WEIGHTS ../tools/output_bak/model_final.pth
The
detection result of the original picture is. It
can be seen that the model has realized cat detection on the picture.