YoloV3 case

YoloV3 case

learning target

  • Familiar with the use of TFRecord files
  • Know the YoloV3 model structure and construction method
  • Know how to handle data
  • Can use yoloV3 model for training and prediction

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-bU6vWdYV-1646365296522) (note picture/2818543934-5e4ba4d2823ec.png)]

data collection

According to the business scenario to be realized, a large amount of image data needs to be collected. Generally speaking, it includes two major sources, one part is network data, which can be open source data, or can be obtained through Baidu and Google image crawlers, and the other part is video recording of user scenarios , the amount of data in this part will be larger. We do not need to label open source data, but crawled data and video recordings need to be labeled. At this time, we can use the open source tool labelImg to label. The screenshot of the software is as follows:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-U7hRlBmL-1646365296523)(note picture/image-20201231173621946.png)]

Specific operation: [Use labelImg to label images under windows]https://www.pianshen.com/article/5220613836/

After the data labeling is completed, we can use it for model training. In the next course, we will use the labeled data for model training and model prediction. The project used is as follows:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-rafoVR4E-1646365296523)(note picture/image-20201231174145613.png)]

The main content is:

1.config is the configuration information of the network: anchors, category information

2. The core is the calculation of the loss function and the content of the network prediction

3. Dateset is the processing of data

4.model is the construction of the model

5.utils are some auxiliary files, including anchor, acquisition of category information, etc.

6. A pre-trained model trained using the coco dataset is saved in weights

TFrecord file

In this case, we still use the VOC dataset for target detection. The difference is that we need to use the tfrecord file to store and read the data. First, let’s take a look at the relevant content of the tfrecord file.

Why use tfrecord files?

  • TFRecord is a data formatting and storage tool officially recommended by Google, tailored for TensorFlow.
  • TFRecord standardizes the way of reading and writing data, and the efficiency of data reading and processing will be significantly improved.

What is a TFrecord file

TFRecord is a data format officially recommended by Google. It is a data format specially designed by Google for TensorFlow. Using this method to store data can make it more adaptable to the network architecture. TFRecord is a binary file, which can make better use of memory, similar to csv and hdf5 files.

The content of the TFRecord file is shown in the following figure:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-4Fm17zs4-1646365296524)(note picture/image-20200923142227273.png)]

TFRecord contains multiple tf.train.Example inside, generally corresponding to an image data, a series of tf.train.feature attributes are included in an Example message body, and each feature is a key-value key-value pair , key is the feature name, and value is the feature value.

TFRecord is not the only data format supported by TensorFlow, and other formats such as CSV or text can also be used, but for TensorFlow, TFRecord is the most friendly and convenient, and tensorflow also provides a rich API to help us easily create and Get the TFRecord file.

Convert data to TFRecord files

For medium and large data sets, Google officially recommends converting the data set to TFRecord data first, which can speed up data reading and preprocessing. Next, we will convert the VOC dataset to the Records format, write the data into the TFRecords file, and use write_to_tfrecord directly to achieve it. First, import the toolkit:

from dataset.vocdata_tfrecord import load_labels,write_to_tfrecord
import os

The process of writing data into tfrecord is:

  1. Specify the dataset path to write
  2. Get all XML markup files
  3. Specify the storage location of tfrecord
  4. get the path to the image
  5. Write data to tfrecord file

The implementation is as follows:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-T7COA8DD-1646365296524)(note picture/image-20210106150711334.png)]

# 指定要写入的数据集路径
data_path = '/Users/yaoxiaoying/Desktop/yoloV3-tf2/dataset/VOCdevkit/VOC2007'
# 获取所有的XML标注文件
all_xml = load_labels(data_path, 'train')
# 指定tfrecord的存储位置
tfrecord_path = 'voc_train.tfrecords'
# 获取图像的路径
voc_img_path = os.path.join(data_path, 'JPEGImages')
# 将数据写入到tfrecord文件中
write_to_tfrecord(all_xml, tfrecord_path, voc_img_path)

The result looks like this:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-ZHFNdsaz-1646365296525) (note picture/insert image description here

Read TFRecord files

The VOC data set has been written into the TFRecord file, then we need to read the data from the TFRecord file. Data can be easily read using only getdata.

Import toolkit:

# 读取tfrecords文件所需的工具包
from dataset.get_tfdata import getdata
# 绘图
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle

Then use getdata to get all the data in the file:

# 指定tfrecord文件的位置,获取tfrecord文件中的数据
datasets = getdata("dataset/voc_val.tfrecords")

We will display the data read from the TFRecord file:

from matplotlib.patches import Rectangle
# 数据类别
from utils.config_utils import read_class_names
classes = read_class_names("config/classname")
# 将tfrecord中的图像进行展示
plt.figure(figsize=(15, 10))
# 初始化:第几个图像
i = 0
# 从datasets中选取3个样本,获取图像,大小,框的标注信息和类别信息
for image, width, height, boxes, boxes_category in datasets.take(3):
    # 进行绘图
    plt.subplot(1, 3, i+1)
    # 绘制图像
    plt.imshow(image)
    # 获取坐标区域
    ax = plt.gca()
    # 遍历所有的框
    for j in range(boxes.shape[0]):
        # 绘制框
        rect = Rectangle((boxes[j, 0], boxes[j, 1]), boxes[j, 2] -boxes[j, 0], boxes[j, 3]-boxes[j, 1], color='r', fill=False)
        # 将框显示在图像上
        ax.add_patch(rect)
        # 显示标注信息
        # 获取标注信息的id
        label_id = boxes_category[j]
        # 获取标准信息
        label = classes.get(label_id.numpy())
        # 将标注信息添加在图像上
        ax.text(boxes[j, 0], boxes[j, 1] + 8, label,color='w', size=11, backgroundcolor="none")
    # 下一个结果
    i += 1
# 显示图像
plt.show()

The result is:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-fUKHfEXg-1646365296526)(note picture/image-20210106151241266.png)]

data processing

The size of the input image of the yoloV3 model is a multiple of 32, so we need to process the image. Here we adjust the scale of the image to 416x416. In order to maintain the aspect ratio, I fill the surrounding pixels with 0 with a gray value of 128, as shown in the following figure:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-JYD6GJYS-1646365296527)(note picture/image-20210106145050791.png)]

To achieve this function use dataset.preprocess to complete, as shown below:

# 输入:原图像及图像上的标准框
# 输出:将尺度调整后的图像,及相应的目标框
image,bbox = preprocess(oriimage,oribbox,input_shape=(416,416))

We process the read data and plot the results:

# 1.导入工具包,
from dataset.preprocess import preprocess as ppro
# 2.创建画布
plt.figure(figsize=(15,10))
# 3.获取数据遍历
i = 0
for image,width,height,boxes,boxes_category in datasets.take(3):
    # 4.进行数据处理
    image,boxes = preprocess(image,boxes)
    # 5.划分不同的坐标轴subplot()
    plt.subplot(1,3,i+1)
    # 6.显示图像:plt.imshow()
    plt.imshow(image[0])
    # 7.显示box,遍历所有的bbox,rectange进行绘制
    ax = plt.gca()
    for j in range(boxes.shape[0]):
        rect = Rectangle((boxes[j, 0], boxes[j, 1]), boxes[j, 2] -boxes[j, 0], boxes[j, 3]-boxes[j, 1], color='r', fill=False)
        ax.add_patch(rect)
        # 8.显示类别
        label_id = boxes_category[j]
        label = classes.get(label_id.numpy())
        ax.text(boxes[j, 0], boxes[j, 1] + 8, label,color='w', size=11, backgroundcolor="none")
    i+=1
plt.show()

model building

The model structure of yoloV3 is as follows: In the entire v3 structure, there is no pooling layer and fully connected layer. The downsampling of the network is achieved by setting the stride of the convolution to 2. The size of the image after passing through this convolutional layer will be reduced to half.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-RJBngux7-1646365296528)(note picture/image-20210106154747841.png)]

When building the network, use model.yoloV3 to build:

# 导入工具包
from model.yoloV3 import YOLOv3
# 模型实例化:指定输入图像的大小,和类别数
yolov3 = YOLOv3((416,416,3),80)
# 获取模型架构
yolov3.summary()

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-HSUghWtQ-1646365296528)(note picture/image-20210106154444551.png)]

Then the model is built here.

model training

Calculation of loss function

The loss function of YoloV3 is divided into three parts:

  • The loss of the box:

Only the anchor in the gridcell responsible for detection will be included in the loss, and the mean square error is calculated for x, y, w, and h respectively

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-5SLe7I98-1646365296529)(note picture/image-20200923173056196.png)]

  • loss of confidence

The loss of confidence is the cross-entropy loss function of the two classifications, and all boxes are included in the loss calculation

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-9R7ACVqz-1646365296530)(note picture/image-20200923172717434.png)]

  • Classification loss:

The loss of the classification is the cross-entropy loss of the two classifications, and the loss is calculated only for those responsible for detecting the target

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-IH5nUYdj-1646365296531)(note picture/image-20200923172749873.png)]

This is done using core.loss when computing the loss function:

# 导入所需的工具包
from core.loss import Loss
# 实例化
yolov3_loss = Loss((416,416,3),80)

Let's look at the input and output of the loss:

# 损失输入
yolov3_loss.inputs

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-Br102izs-1646365296531)(note picture/image-20210106160029787.png)]

# 损失输出
yolov3_loss.outputs

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-7Y7XGu5t-1646365296532)(note picture/image-20210106160114501.png)]

The output result is the loss value of the network, which is a scalar, which can be used to complete the training of the network.

Positive and negative sample settings

In the above loss calculation, the anchor responsible for target prediction is the positive sample, and the one not responsible for target prediction is the negative sample, that is, the background. So how do we set the positive and negative samples here? As shown below:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-W4Xh1mLm-1646365296532)(note picture/image-20200924115029615.png)]

  • Positive sample : first calculate which grid the center point of the target falls on, and then calculate the IOU value of the three a priori boxes (anchor) corresponding to the grid and the real position of the target, and take the a priori box with the largest IOU value to match the target. Then the anchor is responsible for predicting this target, then this anchor is used as a positive sample, and its confidence is set to 1, and other target values ​​​​are set according to the label information.
  • Negative samples : All anchors that are not positive samples are negative samples. Set their confidence to 0 and participate in the loss calculation. Other values ​​do not participate in the loss calculation. The default is 0.

For each anchor, we need 4+1+80 dimensional target values, where the first 4 dimensions are the coordinate values, the positive sample is the value of the bbox box of GT, the fifth dimension is the confidence level, the positive sample is set to 1, and the negative sample Set to 0, the last 80 is the number of categories, the category corresponding to the positive sample is set to 1, and the rest are 0. If the voc dataset is used, the number of categories is 20.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-wLgUOJqN-1646365296533)(note picture/image-20210107114405658.png)]

You can use bbox_to_target to complete the setting of the sample, obtain the image and its annotation information, and obtain the target value, as follows:

# 导入目标值设置所需方法
from core.bbox_target import bbox_to_target
# 获取图像及其标注信息
for image, width, height, boxes, labels in datasets.take(1):
    # 获取anchor的目标值,label1是13*13的目标值,label2是26*26的目标值,label3是52*52的目标值,
    label1,label2,label3 = bbox_to_target(bbox=boxes,label=labels,num_classes=20)

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-7LoCtY9G-1646365296533)(note picture/image-20210106172406512.png)]

The confidence of the anchor of the positive sample is 1, so we obtain the positive sample with a confidence of 1:

# 导入工具包
import tensorflow as tf
# label1[...,0:4]坐标值,label1[...,4]置信度,label1[...,5:]类别分数
index = tf.where(tf.equal(label1[...,4],1))
# index.numpy(),说明索引为12 12 0 个像素中Anchor是正样本
array([[12, 12,  0]])

Its corresponding coordinate value is:

# label1[12, 12,0,0:4].numpy()
array([209., 318.,  88., 108.], dtype=float32)

The target values ​​for classification are:

# label1[12,12,0,5:].numpy()
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0.], dtype=float32)

We plot the target value on the image:

import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
# 1.获取类别信息
from utils.config_utils import read_class_names
classes = read_class_names('config/classname')
# 2.创建画布
plt.figure(figsize=(15,10))
# 3.获取数据遍历
for image,width,height,boxes,boxes_category in datasets.take(1):
    # 4.显示图像:plt.imshow()
    plt.imshow(image)
    # 5.显示box,遍历所有的bbox,rectange进行绘制
    ax = plt.gca()
    for j in range(boxes.shape[0]):
        rect = Rectangle((boxes[j, 0], boxes[j, 1]), boxes[j, 2] -boxes[j, 0], boxes[j, 3]-boxes[j, 1], color='r', fill=False)
        ax.add_patch(rect)
        # 6.显示类别
        label_id = boxes_category[j]
        label = classes.get(label_id.numpy())
        ax.text(boxes[j, 0], boxes[j, 1] + 8, label,color='w', size=11, backgroundcolor="none")
    # 7.绘制正样本的anchor的目标值
    anchor = label1[12, 12,0,0:4].numpy()
    rect2 = Rectangle((anchor[0]-anchor[2]/2, anchor[1]-anchor[3]/2), anchor[2], anchor[3],color='g', fill=False)
    ax.add_patch(rect2)
plt.show()

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-et5CGqlP-1646365296533)(note picture/image-20210106172231076.png)]

model training

We have introduced the network model architecture in detail before. Before network prediction, we need to train the network. Next, we use the end-to-end method for model training. The basic steps are:

1. Load the dataset: We use the VOC dataset here, so we need to load the VOC dataset from the TFrecord file

2. Model instantiation: load the yoloV3 model and realize the loss function

3. Model training: Calculate the loss function and use the backpropagation algorithm to train the model

get dataset

We get the training set data from the tfrecords file:

# 导入
from dataset.preprocess import dataset
# 设置batch_size
batch_size=1
# 获取训练集数据,并指定batchsize,返回训练集数据
trainset = dataset("dataset/voc_train.tfrecords",batch_size)

load model

Instantiate the calculation of the yoloV3 model and loss function:

# V3模型的实例化,指定输入图像的大小,即目标检测的类别个数
yolov3 = YOLOv3((416, 416, 3,), 20)
yolov3_loss = Loss((416,416,3), 20)

model training

Model training is to use the loss function, perform backpropagation, and use the optimizer to update parameters. The training process is:

1. Specify the optimizer: here we use the SGD method with added momentum

2. Set epoch, traverse to obtain batch data and send it to the network for prediction

3. Calculate the loss function and use backpropagation to update the parameters. We use tf.GradientTape to achieve:

  • Define the context: tf.GradientTape
  • Calculate the loss function loss
  • Use to tape.gradient(loss,model.trainable_variables)automatically calculate the gradient, loss is the loss result, and trainable_variables are all variables that need to be trained.
  • Use to optimizer.apply_gradients(zip(grads,model.trainable_variables))automatically update model parameters, zip(grads, trainable_variables) associates gradients with parameters, and then apply_gradients will automatically use gradients to update parameters.

Next, we follow this process to complete the model training and save the model training results.

# 1、定义优化方法
optimizer = tf.keras.optimizers.SGD(0.1,0.9)
# 2.设置epoch,获取batch数据送入网络中进行预测
for epoch in range(300):
    loss_history = []
    # 遍历每一个batch的图像和目标值,进行更新
    for (batch, inputs) in enumerate(trainset):
        images, labels = inputs
        # 3.计算损失函数,使用反向传播更新参数
        # 3.1 定义上下文环境
        with tf.GradientTape() as tape:
            # 3.2 将图像送入网络中
            outputs = yolov3(images)
            # 3.3 计算损失函数
            loss = yolov3_loss([*outputs, *labels])
            # 3.4 计算梯度
            grads = tape.gradient(loss, yolov3.trainable_variables)
            # 3.5 梯度更新
            optimizer.apply_gradients(zip(grads, yolov3.trainable_variables))
            # 3.6 打印信息
            info = 'epoch: %d, batch: %d ,loss: %f'%(epoch, batch, np.mean(loss_history))
            print(info)
            loss_history.append(loss.numpy())
yolov3.save('yolov3.h5')

The change in the loss function is:

epoch: 0, batch: 0 ,loss: 701318.312500
epoch: 0, batch: 1 ,loss: 765384.625000
epoch: 0, batch: 2 ,loss: 747363.000000
epoch: 0, batch: 3 ,loss: 708547.187500
epoch: 0, batch: 4 ,loss: 699261.500000
epoch: 0, batch: 5 ,loss: 727906.812500
epoch: 0, batch: 6 ,loss: 696439.875000
epoch: 0, batch: 7 ,loss: 669801.500000
epoch: 0, batch: 8 ,loss: 669526.875000

After we have trained the model, we can use the trained model to make predictions.

model prediction

We use the trained model to make predictions, here we make predictions through the yoloV3 model, and draw the prediction results on the image. First import the toolkit. The pre-trained model is trained using the coco dataset, so specify the corresponding category information:

# 读取图像,绘图的工具包
import cv2
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
# yoloV3的预测器
from core.predicter import Predictor

# coco数据集中的类别信息
classes = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 
           'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 
           'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog',
           'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
           'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 
           'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 
           'skateboard', 'surfboard','tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 
           'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 
           'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 
           'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 
           'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 
           'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

The whole process is:

1. Read the image for target detection

2. Instantiate the predictor of yoloV3 and load the pre-trained model.

3. Use the predictor to perform target detection on the picture

4. Draw the detection results on the image

The implementation is as follows:

# 1. 图像读取
img = cv2.imread("image.jpg")
# 2.实例化,并加载预训练模型
predictor = Predictor(class_num=80, yolov3="weights/yolov3.h5")
# 3.获取检测结果
boundings = predictor.predict(img)
# 4.将检测结果绘制在图像上
# 4.1 显示图像
plt.imshow(img[:, :, ::-1])
# 获取坐标区域
ax = plt.gca()
# 4.2 遍历检测框,将检测框绘制在图像上
for bounding in boundings:
    # 绘制框
    rect = Rectangle((bounding[0].numpy(), bounding[1].numpy()), bounding[2].numpy(
    ) - bounding[0].numpy(), bounding[3].numpy()-bounding[1].numpy(), color='r', fill=False)
    # 将框显示在图像上
    ax.add_patch(rect)
    # 显示类别信息
    # 获取类别信息的id
    label_id = bounding[5].numpy().astype('int32')
    # 获取类别
    label = classes[label_id]
    # 将标注信息添加在图像上
    ax.text(bounding[0].numpy(), bounding[1].numpy() + 8,
            label, color='w', size=11, backgroundcolor="none")
# 显示图像
plt.show()

The prediction results are shown in the figure below:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-X3n1CSrM-1646365296534)(note picture/image-20200923165753557.png)]

Summarize

  • Familiar with the use of TFRecord files

TFRecord is a data formatting and storage tool officially recommended by Google, tailored for TensorFlow. TFRecord contains multiple tf.train.Example inside, generally corresponding to an image data, a series of tf.train.feature attributes are included in an Example message body, and each feature is a key-value key-value pair .

  • Know the YoloV3 model structure and construction method

Construction of basic components, backbone, output, yoloV3, conversion of output values

  • Know how to handle data

Know how to resize the image, keep the aspect ratio, and pad

  • Can use yoloV3 model for training and prediction

Know the loss function, positive and negative sample settings, training, and the process of prediction.

Guess you like

Origin blog.csdn.net/qq_43966129/article/details/123273402