1. Prepare VOC data

Download the VOC data and put the VOC data in the datasets folder. The two folders VOC2007 and VOC2012 are opened separately, and the structure of the folder does not need to be changed.

datasets
|----VOC2007
|----VOC2012

2. Data training

2.1 Training commands

Start training with the following command.

CUDA_VISIBLE_DEVICES=0 python3 train_net.py \
	--config-file ../configs/PascalVOC-Detection/faster_rcnn_R_50_FPN.yaml \
	--num-gpus 1 SOLVER.IMS_PER_BATCH 8 SOLVER.BASE_LR 0.01

CUDA_VISIBLE_DEVICES can select the GPU to deploy training;
--config-fileselect the network structure configuration;
--num-gpusselect the number of training GPUs to
SOLVER.IMS_PER_BATCHset the training batch_size;
SOLVER.BASE_LRset the training learning_rate;

2.2 config yaml file

The faster_rcnn_R_50_FPN.yamloriginal settings of SOLVER in the configuration file are as follows. The maximum number of iterations is 18000, 17.4 epochswhich means that the entire data set has been iterated for 17.4 epochs. The VOC data set has a total of 16551 training images 18000*16/16551=17.4. If the batch size is small, you can modify it. The number of iteration steps to ensure that the data set is trained for 17.4 epochs.

SOLVER:
  STEPS: (12000, 16000)
  MAX_ITER: 18000  # 17.4 epochs
  WARMUP_ITERS: 100

3. Test results

3.1 Training evaluation

After the training is completed, the performance of the model will be evaluated, and the following results will be output.

[04/23 06:04:30] d2.evaluation.pascal_voc_evaluation INFO: Evaluating voc_2007_test using 2007 metric. Note that results do not use the official Matlab API.
[04/23 06:05:08] d2.engine.defaults INFO: Evaluation results for voc_2007_test in csv format:
[04/23 06:05:08] d2.evaluation.testing INFO: copypaste: Task: bbox
[04/23 06:05:08] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75
[04/23 06:05:08] d2.evaluation.testing INFO: copypaste: 51.3760,80.6508,55.5273
[04/23 06:05:08] d2.utils.events INFO:  eta: 0:00:01  iter: 17999  total_loss: 0.283  loss_cls: 0.109  loss_box_reg: 0.154  loss_rpn_cls: 0.008  loss_rpn_loc: 0.017  time: 1.1067  data_time: 0.0382  lr: 0.000100  max_mem: 9423M

The results show that the model has an AP of 0.5:0.95 on the VOC2007 test data set of 51.38, an AP of 0.5 of 80.65, and an AP of 0.75 of 55.53.

3.2 benchmark

From the MODEL ZOO of detectron2 , you can see the baseline of Faster RCNN on the Cityscapes and Pascal VOC Baselines data sets, and you can see that the mAP of the training result is basically the same as the baseline.

Name	train time (s/iter)	inference time (s/im)	train mem (GB)	box AP	box AP50	AP mask	model id	download
R50-FPN, Cityscapes	0.240	0.078	4.4			36.5	142423278	model \| metrics
R50-C4, VOC	0.537	0.081	4.8	51.9	80.3		142202221	model \| metrics

3.3 Inference

The detection of a single picture on the picture can be achieved by the following command

python3 demo.py --config-file ../configs/PascalVOC-Detection/faster_rcnn_R_50_FPN.yaml \
  --input cat.jpg \
  --output result_cat_voc.jpg \
  --opts MODEL.WEIGHTS ../tools/output_bak/model_final.pth

The

detection result of the original picture is. It

can be seen that the model has realized cat detection on the picture.

Ubuntu18.04 implements Faster RCNN training VOC data based on detectron2 (2)