NanoDet-train your own data set

No public

Insert picture description here

Introduction to NanoDet

nanodet URL: https://github.com/RangiLyu/nanodet

NanoDet is an ultra-fast and lightweight mobile Anchor-free target detection model. This model has the following advantages:

  • Ultra-lightweight: the model file size is only 1.8m;
  • Super fast speed: 97fps (10.23ms) on mobile ARM CPU;
  • Training friendly: GPU memory cost is much lower than other models. The batch-size on GTX1060 6G can run if the batch-size is 80;
  • Easy to deploy: C++ implementation and Android demo based on ncnn reasoning framework are provided.

Model performance

At present, the open source NanoDet-m model has a Flops of only 0.72B at 320x320 input resolution , while yolov4-tiny has 6.96B, which is nearly ten times smaller . The parameter amount of the model is also only 0.95M, and the weight file is only 1.8mb after 16-bit storage using ncnn optimize.

Although the model is very lightweight, its performance should not be underestimated. When comparing with other models, the project author chose to use COCO mAP (0.5:0.95) as the evaluation index , taking into account the accuracy of detection and positioning, testing on COCO val 5000 pictures, and did not use Testing-Time-Augmentation. In this setting, the 320 resolution input can reach a mAP of 20.6, which is 4 points higher than tiny-yolov3 and only 1 percentage point lower than yolov4-tiny. When keeping the input resolution consistent with YOLO and both using 416 input, NanoDet and yolov4-tiny score the same. The specific results are shown in the following table:

[External link image transfer failed. The source site may have an anti-hotlinking mechanism. It is recommended to save the image and upload it directly (img-brtkSc4J-1616501467302)(C:\Users\zhong\AppData\Roaming\Typora\typora-user-images\ image-20210323194120724.png)]

The above performance is based on ncnn and Kirin 980 (4xA76+4xA55) ARM CPU.

In addition, the project author deployed ncnn to the mobile phone (ARM-based CPU Kirin 980, 4 A76 cores and 4 A55 cores) and ran the benchmark. The forward calculation time of the model is only about 10 milliseconds, while the yolov3 and v4 tiny Both are on the order of 30 milliseconds. On the Android camera demo app, the NanoDet can easily run to 40+FPS, including the time for image preprocessing, post-processing of the detection frame, and drawing of the detection frame.

img

Performance comparison between NanoDet and yolov4-tiny.

Finally, the project provides Android demo, C++ demo and Python demo. The target detection results of NanoDet on the Android side are as follows:

img

Train your own data set

I have put all the codes I use on github, welcome to start.

https://github.com/zlszhonglongshen/nanodet_cigar

You need to store your own data set in cigar/train and cigar/val, using labelimg software, you can easily get the label file in voc format

Note: The suffix of the picture file must be consistent, otherwise an error will be reported

fire
├── train
│   ├── ann
│   │   ├── 1.xml
│   │   └── 2.xml
│   └── img
│       ├── 1.jpg
│       └── 2.jpg
└── val
    ├── ann
    │   └── 1.xml
    └── img
        └── 1.jpg

Nanodet supports voc and coco formats, you need to generate your own config file

save_dir: ./fire
num_classes: 1
class_names: &class_names ['fire']
train:
  name: xml_dataset
  img_path: ./fire/train/img
  ann_path: ./fire/train/ann
  input_szie: [320,320]
val:
  name: xml_dataset
  img_path: ./fire/val/img
  ann_path: ./fire/val/ann
  input_szie: [320,320]

The training commands are as follows

/home/zhongls/.conda/envs/pyzhongls/bin/python  train.py  cigar/nanodet_card.yml

Result file

Eventually you will get model_last.pth in the folder

Test script

import time
import os
import cv2
import torch

from nanodet.util import cfg, load_config, Logger
from nanodet.model.arch import build_model
from nanodet.util import load_model_weight
from nanodet.data.transform import Pipeline

config_path = 'nanodet_card.yml'
model_path = 'model_last.pth'
image_path = '000-0.jpg'


load_config(cfg, config_path)
logger = Logger(-1, use_tensorboard=False)

class Predictor(object):
    def __init__(self, cfg, model_path, logger, device='cuda:0'):
        self.cfg = cfg
        self.device = device
        model = build_model(cfg.model)
        ckpt = torch.load(model_path, map_location=lambda storage, loc: storage)
        load_model_weight(model, ckpt, logger)
        self.model = model.to(device).eval()
        self.pipeline = Pipeline(cfg.data.val.pipeline, cfg.data.val.keep_ratio)

    def inference(self, img):
        img_info = {}
        height, width = img.shape[:2]
        img_info['height'] = height
        img_info['width'] = width
        meta = dict(img_info=img_info,raw_img=img,img=img)
        meta = self.pipeline(meta, self.cfg.data.val.input_size)
        meta['img'] = torch.from_numpy(meta['img'].transpose(2, 0, 1)).unsqueeze(0).to(self.device)
        with torch.no_grad():
            results = self.model.inference(meta)
        return meta, results

    def visualize(self, dets, meta, class_names, score_thres, wait=0):
        time1 = time.time()
        self.model.head.show_result(meta['raw_img'], dets, class_names, score_thres=score_thres, show=True)
        print('viz time: {:.3f}s'.format(time.time()-time1))
        
        
predictor = Predictor(cfg, model_path, logger, device='cpu')


from nanodet.util import overlay_bbox_cv

from IPython.display import display
from PIL import Image

def cv2_imshow(a, convert_bgr_to_rgb=True):
    """A replacement for cv2.imshow() for use in Jupyter notebooks.
    Args:
        a: np.ndarray. shape (N, M) or (N, M, 1) is an NxM grayscale image. shape
            (N, M, 3) is an NxM BGR color image. shape (N, M, 4) is an NxM BGRA color
            image.
        convert_bgr_to_rgb: switch to convert BGR to RGB channel.
    """
    a = a.clip(0, 255).astype('uint8')
    # cv2 stores colors as BGR; convert to RGB
    if convert_bgr_to_rgb and a.ndim == 3:
        if a.shape[2] == 4:
            a = cv2.cvtColor(a, cv2.COLOR_BGRA2RGBA)
        else:
            a = cv2.cvtColor(a, cv2.COLOR_BGR2RGB)
    display(Image.fromarray(a))

frame = cv2.imread("000-0.jpg")
meta, res = predictor.inference(frame)
result = overlay_bbox_cv(meta['raw_img'], res, cfg.class_names, score_thresh=0.35)

imshow_scale = 1.0
cv2_imshow(cv2.resize(result, None, fx=imshow_scale, fy=imshow_scale))

[External link image transfer failed. The source site may have an anti-hotlinking mechanism. It is recommended to save the image and upload it directly (img-CtrqYIoO-1616501467309) (C:\Users\zhong\AppData\Roaming\Typora\typora-user-images\ image-20210323195132895.png)]

Reference link

  • https://blog.csdn.net/zicai_jiayou/article/details/110469717

  • https://www.jiqizhixin.com/articles/2020-11-24-5

  • https://blog.csdn.net/qq_34795071/article/details/110083258

  • https://github.com/RangiLyu/nanodet

Guess you like

Origin blog.csdn.net/zhonglongshen/article/details/115141148