YOLOv3 trains its own data set to achieve target detection

The full name of YOLO is You Only Look Once, which is the earliest single-stage target detection method and the first method to achieve real-time target detection. The field of computer vision mainly includes two aspects: image classification and target detection. Image classification refers to distinguishing different types of images based on the semantic information of the image, such as face recognition, that is, the model inputs a picture and judges that the picture belongs to a certain category.

YOLO is a foreign open source target detection algorithm. The currently popular YOLO algorithm is divided into three versions, namely YOLOv1, YOLOv2, and YOLOv3. The core idea of ​​YOLO is to use the entire image as the input of the network and directly return to the bounding box( Boundary) location and category. 

The overall structure of YOLO is as follows:

The network is improved based on GoogLeNet, the input picture is 448*448 size, and the output is 7×7×(2×5+20). The original picture is divided into S×S cells, and the subsequent output is performed in units of cells. If the center of an object falls on a certain cell, then this cell is responsible for predicting the object. Each cell needs to predict B box values ​​(box values ​​include coordinates and width and height), and at the same time predict a confidence score for each box value. That is, each cell needs to predict B×(4+1) values. Each cell needs to predict C (number of object types) conditional probability values. Therefore, the output dimension of the final network is S×S×(B×5+C). Although each cell is responsible for predicting an object, each cell can predict multiple box values.

So how to use YOLOv3 to train your own model?

First, a batch of image data is needed, such as the following data:

The pictures include 100 pictures of two types of cats and dogs. Secondly, it is necessary to make labels for this batch of pictures, that is, where is the cat of a certain picture, where is the dog, and use Labelmg and other tools to make the label.

After clicking save, a label file ending in xml will be generated with the following content:

Since the picture contains only two key targets, that is, there are only two objects, and the coordinates of the corresponding objects are stored, the label of the picture has been initially generated.

Second, divide the data into training set, test set, and validation set, and execute the following script:

Finally, convert the .xml data into the VOC data needed by YOLO, and execute the following script:

After data preparation is complete, model training can be carried out.

Operating system: MacOS 10.15.6

Python:3.7.6

Tensorflow:1.13

Hard : 2.15

OpenCV:4.3.0

Part of the training code is as follows:

After the training is completed, the weight file of the model and the visualization file of the neural network model structure will be produced under the logs file. Part of the model structure is as follows:

The model test results are as follows:

 

 

Guess you like

Origin blog.csdn.net/gf19960103/article/details/109354781
Recommended