YOLO v2 (Darknet) training custom dataset using records

1. Environmental preparation

Darknet official website: https://pjreddie.com/darknet/yolo/

GitHub address: https://github.com/pjreddie/darknet

For Windows version, please refer to: https://github.com/AlexeyAB/darknet

Refer to the official website process: download the source code, compile, download the weight file that has been trained based on the voc or coco data set, and perform the target detection test.

If all goes well, you can prepare your own dataset for training.

2. Prepare data

Prepare training images (take weapon image recognition as an example), refer to the VOC Data dataset format, the samples are uniformly in jpg format, and the names are also named in a format similar to 000001.jpg and 000002.jpg like VOC.

Place the image files in the structure of the VOC dataset. The structure of a VOC is as follows:

--VOC
    --Annotations
    --ImageSets
      --Main
      --Layout
      --Segmentation
    --JPEGImages
    --SegmentationClass
    --SegmentationObject

     The folders used here are Annotations, ImageSets and JPEGImages. The folder Annotations mainly stores xml files, each xml corresponds to an image, and each xml stores the location and category information of each marked target, and the naming is usually the same as the corresponding original image; and for ImageSets, we only need Use the Main folder, which stores some text files, usually train.txt, test.txt, etc. The content of the text file is the name of the image that needs to be used for training or testing (no suffix and no path); In the JPEGImages folder, put the original images that we have named according to the uniform rules.

      1. Create a new folder WeaponData

      2. Create three new folders Annotations, ImageSets and JPEGImages under the WeaponData folder, and put the prepared original images in the JPEGImages folder

      3. In the ImageSets folder, create three new empty folders, Layout, Main, and Segmentation, and then copy the text with the name of the training or test image to the Main folder, and press training (train) and verification (val) , The purpose of the test (test) is named (see later).

3. Annotated datasets

After obtaining a large amount of data, in order to achieve target detection, it is also necessary to label the image with the object category and location. Darknet uses the xml file format to store the coordinate positions of the labeled objects. For training its own data, you can refer to the folder and file structure of the Pascal VOC Data dataset to create the corresponding label file. There are many tools related to labeling pictures. Here we take labelImg as an example. The operation is as follows:

After the annotation is completed, the object annotation information of all pictures will be generated in the Annotation folder, and each picture corresponds to an xml file. The final generated file is as follows:

Each xml file contains the size of the picture, the category and coordinate position of the marked object.

4. Prepare for training

1. Divide all samples into three parts: training set, test set, and validation set according to a certain proportion, for example, according to the ratio of training set, test set, and validation set to 3:1:1 distribution, and generate three files train.txt, val respectively .txt, test.txt:

You can use the language you are familiar with to generate and generate. Each line in the file is a file name without an extension, as shown in the following figure:

2. Upload the WeaponData folder and the Darknet source code to the server (if it is done on the server at the beginning, omit it), for example, put it in the /opt/dev/yolo folder, where /opt/dev/yolo/darknet is the source code .

3. Create a new folder VOCdevkit in the /opt/dev/yolo folder, copy the entire WeaponData folder prepared earlier to the VOCdevkit folder, and rename it to VOC2018 (if you do not rename it, you need to modify the voc_label. py script to match the folder name).

mkdir VOCdevkit
mv WeaponData VOCdevkit/VOC2018

4. Download the voc_label.py file to /opt/dev/yolo:

cd /opt/dev/yolo
wget https://pjreddie.com/media/files/voc_label.py

And modify the sets and classes section in voc_label.py :

Finally run the script:

python voc_label.py

After completion , the folder labels will be generated under the folder /opt/dev/yolo/VOCdevkit/WeaponData . The folder contains the category and the corresponding normalized location. At the same time , it should be under /opt/dev/yolo / Three files, 2018_train.txt, 2018_test.txt, and 2018_val.txt, are also generated, which contain the absolute paths of all samples.

Then execute the following command to use the contents of 2018_train.txt and 2018_val.txt for training:

cat 2018_train.txt 2018_val.txt > train.txt

5. Configuration file modification

1. Modify /opt/dev/yolo /darknet/data/voc.names to the target category that needs to be identified:

aeroplane
person
building
ship
vehicle

2. Modify cfg/voc.data

classes= 5
train  = /opt/dev/yolo/train.txt
valid  = /opt/dev/yolo/2018_test.txt
names = /opt/dev/yolo/darknet/data/voc.names
backup = /opt/dev/yolo/results

Where classes is the number of sample categories, train is the file path of the training sample, valid is the verification sample file path, names is the category file, and backup is the weight file backup directory (created first).

3. Modify cfg/yolo-voc.2.0.cfg (you can also use cfg/yolo-voc.cfg )

Modify the filters of the last volume base layer and the classes of the last region.
Among them, filters=num×(classes + coords + 1)=5*(5+4+1)=50, there are only 5 categories here.

[convolutional]
size=1
stride=1
pad=1
filters=50
activation=linear

[region]
anchors = 1.08,1.19,  3.42,4.41,  6.63,11.38,  9.42,5.11,  16.62,10.52
bias_match=1
classes=5
coords=4
num=5
softmax=1
jitter=.2
rescore=1

……

6. Modify the Makefile

It is best to use GPU for training, otherwise the CPU is very slow, the source code is not enabled by default , and it has been pitted for a long time without paying attention at the beginning.

Install cuda+cudnn reference: https://www.cnblogs.com/573177885qq/p/6632576.html

Then modify the Makefile:

cd /opt/dev/yolo/darknet
vim Makefile

Modify the top three lines to enable GPU and CUDNN:

GPU=1
CUDNN=1
OPENCV=1
and the following NVCC are their own paths:

NVCC=/usr/local/cuda-8.0/bin/nvcc

Recompile :

make clean
make

7. Start training

You can download ImageNet's first 23 layers of pre-trained weights to speed up training:

cd /opt/dev/yolo/darknet
wget https://pjreddie.com/media/files/darknet19_448.conv.23

Then execute the following command to start training:

./darknet detector train cfg/voc.data cfg/yolo-voc.2.0.cfg ./darknet19_448.conv.23 

If you need to analyze the output log, you can use tee after the command to output the log to a file.

The training time is relatively long, you can observe the loss, and if it stabilizes to a certain extent, you can stop the training.

Here is an example of the output for each iteration of the training process:

Loaded: 0.000031 seconds
Region Avg IOU: 0.805040, Class: 0.929115, Obj: 0.777778, No Obj: 0.004146, Avg Recall: 0.875000,  count: 8
Region Avg IOU: 0.826887, Class: 0.999643, Obj: 0.778379, No Obj: 0.006632, Avg Recall: 0.916667,  count: 12
Region Avg IOU: 0.760517, Class: 0.999070, Obj: 0.698473, No Obj: 0.004795, Avg Recall: 0.846154,  count: 13
Region Avg IOU: 0.840628, Class: 0.999687, Obj: 0.805357, No Obj: 0.005085, Avg Recall: 0.900000,  count: 10
Region Avg IOU: 0.670166, Class: 0.944164, Obj: 0.620956, No Obj: 0.004349, Avg Recall: 0.777778,  count: 18
Region Avg IOU: 0.849498, Class: 0.999253, Obj: 0.743897, No Obj: 0.006114, Avg Recall: 0.933333,  count: 15
Region Avg IOU: 0.625192, Class: 0.957918, Obj: 0.562712, No Obj: 0.005363, Avg Recall: 0.550000,  count: 20
Region Avg IOU: 0.711634, Class: 0.999687, Obj: 0.687795, No Obj: 0.006114, Avg Recall: 0.941176,  count: 17
29391: 1.344486, 1.478107 avg, 0.000100 rate, 4.674087 seconds, 1881024 images

The above screenshot shows a batch of all training images, the batch size is divided according to the subdivisions parameter we set in the .cfg file. In the .cfg file I use, batch = 64, subdivision = 8, so in the training output, the training iteration contains 8 groups, and each group contains 8 pictures, which are consistent with the set batch and subdivision values.

(Note: That is to say, each iteration will randomly select batch = 64 samples from all training sets to participate in training, and all these batch samples are divided into subdivision = 8 times and sent to the network for training to reduce memory usage. pressure)

For the information in the last line of the above figure, the meaning is as follows:

  •     29391: Indicates the number of iterations for the current training
  •     1.344486: is the overall Loss (loss)
  •     1.478107 avg: It is the average Loss. The lower the value, the better. Generally speaking, once this value is lower than 0.060730 avg, the training can be terminated (because the quality of my samples is not very good, this value has not been reduced).
  •     0.000100 rate: represents the current learning rate, which is defined in the .cfg file.
  •     4.674087 seconds: Indicates the total time spent training the current batch.
  •     1881024 images: Indicates the total number of images that have participated in training so far.

8. Test

After training to a certain stage, the intermediate output weights can be used for recognition testing. For example, 10,000 iterations of files are used for testing:

./darknet detector test cfg/voc.data cfg/yolo-voc.2.0.cfg yolo-voc_10000.weights data/person.jpg

It should be noted here that you must use the ./darknet detector test command to specify your own voc.data and cfg files.

The detection result will be output to predictions.png (or predictions.jpg), which can be opened to check whether the detection is correct.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325181524&siteId=291194637