Target detection based on Faster R-CNN

1. Introduction of the author

Yang Jinpeng, male, 2022 graduate student, School of Electronic Information, Xi'an Polytechnic University
Research direction: Machine Vision and Artificial Intelligence
Email: [email protected]

Lu Zhidong, male, School of Electronic Information, Xi'an Polytechnic University, 2022 graduate student, Zhang Hongwei's artificial intelligence research group
Research direction: machine vision and artificial intelligence
Email: [email protected]

2. Basic framework of Faster RCNN

insert image description here
The Faster RCNN detection part can be mainly divided into four modules:
(1) conv layers. That is, the feature extraction network is used to extract features. The feature maps of the image are extracted through a set of conv+relu+pooling layers for subsequent RPN layers and proposals.
(2) RPN (Region Proposal Network). That is, the region candidate network, which replaces the Selective Search of the previous RCNN version, and is used to generate candidate boxes. There are two parts to the task here, one is classification: judge whether all preset anchors are positive or negative (that is, whether there is a target in the anchor, two classifications); there is also a bounding box regression: correct anchors to get more accurate proposals. Therefore, the RPN network is equivalent to doing a part of detection in advance, that is, judging whether there is a target (the specific category is not judged here), and correcting the anchor to make the frame more accurate.
(3) RoI Pooling. That is, domain of interest pooling (spatial pyramid pooling in SPP net), which is used to collect proposals (coordinates of each box) generated by RPN and extract them from the feature maps in (1) (deducted from the corresponding position) , generate proposals feature maps and send them to the subsequent fully connected layer to continue classification (specifically which category) and regression.
(4) Classification and Regression. Use the proposals feature maps to calculate the specific category, and do another bounding box regression to obtain the final precise position of the detection frame.
insert image description here

3. Model training and testing

3.1 Dataset

Before training, it is necessary to make a data set and perform data preparation. The wire receiving and feeding device is used to collect pictures in real time. The image size is 1536*1280, and then marked, and finally sent to the network for training. In order to simulate the field effect of detection, four common defects in yarn are divided: loops, hair balls, forks and hair defects, and the data set is marked by marking software. The annotation file is placed in VOC2007/Annotations, the image file is placed in the VOC2007/JPEGImages directory, and four txt files in ImageSet\Main are generated, namely: trainval.txt (sum of training and verification sets), train.txt (training set) , val.txt (validation set), test.txt (test set), the ratio of training set/validation set/test set is 6:2:2.
insert image description here

3.2 Environment Configuration

insert image description here
insert image description here

3.3 Training parameters

(1) Modify lib/datasets/pascal_voc.py and change the category to your own category. One thing to note here is that the categories here and the previous category names are best to be all lowercase. If they are uppercase, a keyError error will be reported. (2) Set the
insert image description here
relevant parameters in the code according to actual needs and hardware conditions, which need to be modified –num-classes, –data-path and –weights-path and other parameters, the training process is shown in the figure.
insert image description here
insert image description here
The predicted results are shown in the figure
insert image description here
insert image description here

3.4 Training parameters

PicoDet test
The picture shows the training process and prediction results
insert image description here
insert image description here
YOLOv3 test
The picture shows the training process and prediction results
insert image description here
insert image description here

3.5 Code display

import os
import time
import torch
import torchvision.transforms as transforms
import torchvision
from PIL import Image
from matplotlib import pyplot as plt

# 获取当前路径
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


# classes_coco类别信息
COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]


if __name__ == "__main__":

    # 检测图片路径
    path_img = os.path.join(BASE_DIR, "bear.jpg")

    # 预处理
    preprocess = transforms.Compose([
        transforms.ToTensor(),
    ])

    input_image = Image.open(path_img).convert("RGB")
    img_chw = preprocess(input_image)

    # 加载预训练模型
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
    model.eval()

    # if torch.cuda.is_available():
    #     img_chw = img_chw.to('cuda')
    #     model.to('cuda')

    # 前向传播
    input_list = [img_chw]
    with torch.no_grad():
        tic = time.time()
        print("input img tensor shape:{}".format(input_list[0].shape))
        output_list = mode

3.6 Problems and Analysis

Training parameter debugging and environment configuration
Before training, it is necessary to configure the required environment well, and at the same time, whether the called library version meets the requirements, and sometimes there is a one-to-one correspondence between the versions of each library. The debugging of parameters during training is very important, and has a very important impact on the detection effect of the model. It is necessary to try and do more experiments to explore the influence of different parameters on the model.
Prepare the data set
During the collection process of the data set, it is necessary to avoid external interference as much as possible, and at the same time pay attention to the lighting method and brightness, etc., to ensure that the objects and defects to be detected can be collected clearly. Although the data set does not seem to be very important in the whole project, whether the data set is collected clearly and marked correctly will have a great impact on the detection results.

References (links and citations available for reference)

1. Link: [http://t.csdn.cn/JrWZ1]
2. Link: [http://t.csdn.cn/TjQov]

Guess you like

Origin blog.csdn.net/m0_37758063/article/details/130976850