深度篇——目标检测史(七) 细说 YOLO-V3目标检测之代码详解

下一章：深度篇——目标检测史(八) 细说 CornerNet-Lite 目标检测

论文地址：《YOLO-V3》

代码地址：tf_yolov3_pro

本小节，细说 YOLO-V3目标检测之代码详解，下一小节细说 CornerNet-Lite 目标检测

八. YOLO-V3 目标检测之代码详解

之所以将 yolo-v3 的代码详解放在这边，是因为，如果放在上一小节的话，篇幅就很长了，读者估计阅读很不方便，而且容易疲劳，而且，还可以暴走骂人了。这代码，和前面的理论，是紧密结合的。下面的代码，我做了很多注释，对于相关理论，对代码的实现的相关讲解。

1. 代码结构图形如下：

具体情况，可以参考 README.md

# [tf_yolov3_pro](https://github.com/wandaoyi/tf_yolov3_pro)
tensorflow 版本的 yolov3 目标检测项目 2020-03-18

- [论文地址](https://pjreddie.com/media/files/papers/YOLOv3.pdf)
- [我的 CSDN 博客](https://blog.csdn.net/qq_38299170) 
- 环境依赖(其实版本要求并不严格，你的版本要是能跑起来，那也是OK的)：
```bashrc
pip install easydict
pip install numpy==1.16
conda install tensorflow-gpu==1.13.1
pip install tqdm
pip install opencv-python
```
- 对应数据来说，用户可以用自己的数据来跑，当然，也可以到网上去下载开源数据
- 下面是开源数据的链接：
```bashrc
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
$ wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
```
- 将数据放到指定的文件目录下(config.py 文件):
```bashrc
# 图像路径
__C.COMMON.IMAGE_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "data/images")
# xml 路径
__C.COMMON.ANNOTATION_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "data/annotations")
```
- 其实，做好依赖，拿到数据，就仔细看看 config.py 文件，里面全是配置。配好路径或一些超参，基本上，后面就是一键运行就 OK 了。
- 对 config.py 进行配置设置。

## 数据生成
训练模型之前，我们需要做数据，即是将数据做成我们网络所需要的样子
- 按上面的提示，将原始数据放到指定路径下，或将 config.py 里面的路径指向你的原始数据路径。还有，就是就是目标数据路径。然后，一键运行 prepare.py 文件，就可以生成 .txt 的目标数据。

## 训练模型
- 上面 config.py 调好了，而且数据也已经生成了，那，是驴是马，跑起来再说。还是一键运行 yolo_train.py。
- 在训练的过程中，最好 batch_size 不要太小，不然，loss 不好收敛。比方说，你 batch_size = 1 和 batch_size = 8 效果是不一样的。
- 在训练中，可以根据 loss 和 日志 进行人为的选择模型。

## 模型冻结
- 将上面训练得到的 .ckpt 模型文件，冻结成 .pb 文件。一键运行 model_freeze.py 文件
- 冻结模型，会对模型一定程度上的压缩，而且精度几乎不损。

## 图像预测
- 一键运行 yolo_test.py 文件(可以运行 prepare.py 来生成自己想要的数据，当然，前提是配置 config.py 文件)

## 视频预测
- 一键运行 yolo_video.py 文件

## 本项目的优点
- 就是方便，很多东西，我已经做成傻瓜式一键操作的方式。里面的路径，如果不喜欢用相对路径的，可以在 config.py 里面选择 绝对路径
- 本人和唠叨，里面的代码，基本都做了注解，就怕有人不理解，不懂，我只是希望能给予不同的你，一点点帮助。

## 本项目的缺点
- 没做 mAP
- 没做多线程(这些，以后有机会，会在博客中再详解)

看完 README.md 就看 config.py 配置文件，做好配置之后，后面，基本上都是一键操作

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/03/07 14:09
# @Author   : WanDaoYi
# @FileName : config.py
# ============================================

import os
from easydict import EasyDict as edict


__C = edict()
# Consumers can get config by: from config import cfg
cfg = __C

# common options 公共配置文件
__C.COMMON = edict()
# windows 获取文件绝对路径, 方便 windows 在黑窗口 运行项目
__C.COMMON.BASE_PATH = os.path.abspath(os.path.dirname(__file__))
# # 获取当前窗口的路径, 当用 Linux 的时候切用这个，不然会报错。(windows也可以用这个)
# __C.COMMON.BASE_PATH = os.getcwd()

# 相对路径 当前路径
__C.COMMON.RELATIVE_PATH = "./"

# class 文件路径
__C.COMMON.CLASS_FILE_PATH = os.path.join(__C.COMMON.BASE_PATH, "infos/classes/voc_class.txt")
# anchor 文件路径
__C.COMMON.ANCHOR_FILE_PATH = os.path.join(__C.COMMON.BASE_PATH, "infos/anchors/coco_anchors.txt")

# iou 损失的 阈值
__C.COMMON.IOU_LOSS_THRESH = 0.5

# 超参
__C.COMMON.ALPHA = 1.0
__C.COMMON.GAMMA = 2.0

# 每个尺度最多允许有 几个 bounding boxes
__C.COMMON.MAX_BBOX_PER_SCALE = 150
# 衰减率的 移动平均值，用来控制模型的更新速度
# decay设置为接近1的值比较合理，
# 通常为：0.999,0.9999等，decay越大模型越稳定，
# 因为decay越大，参数更新的速度就越慢，趋于稳定
__C.COMMON.MOVING_AVE_DECAY = 0.9995

# 图像路径
__C.COMMON.IMAGE_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "data/images")
# xml 路径
__C.COMMON.ANNOTATION_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "data/annotations")

# 数据划分比例
__C.COMMON.TRAIN_PERCENT = 0.7
__C.COMMON.VAL_PERCENT = 0.2
__C.COMMON.TEST_PERCENT = 0.1

# 图像后缀名
__C.COMMON.IMAGE_EXTENSION = ".jpg"

# YOLO options
__C.YOLO = edict()
# YOLOV3 的 3 个尺度
__C.YOLO.STRIDES = [8, 16, 32]
# YOLOV3 上采样的方法
__C.YOLO.UP_SAMPLE_METHOD = "resize"
# YOLOV3 每个尺度包含 3 个 anchors
__C.YOLO.ANCHOR_PER_SCALE = 3

# Train options
__C.TRAIN = edict()
# 训练集数据
__C.TRAIN.TRAIN_DATA_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "infos/dataset/voc_train.txt")
__C.TRAIN.VAL_DATA_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "infos/dataset/voc_val.txt")
# 训练集 input size
__C.TRAIN.INPUT_SIZE_LIST = [320, 352, 384, 416, 448, 480, 512, 544, 576, 608]
__C.TRAIN.TRAIN_BATCH_SIZE = 1
__C.TRAIN.VAL_BATCH_SIZE = 2
# 学习率的范围
__C.TRAIN.LEARNING_RATE_INIT = 1e-3
__C.TRAIN.LEARNING_RATE_END = 1e-6
# 第一阶段的训练 epoch
__C.TRAIN.FIRST_STAGE_EPOCHS = 16
# 第二阶段的训练 epoch 用于表述，如果是预训练的话，第一阶段训练会冻结参数
__C.TRAIN.SECOND_STAGE_EPOCHS = 32
# 预热训练，即在预热之前，learning_rate 学习率简单的 人为缩小，即 前面 [: 2] 个 epochs
# 预热之后，则 learning_rate 随着训练次数  人为在缩小，
# 即 [2: FIRST_STAGE_EPOCHS + SECOND_STAGE_EPOCHS] 个 epochs
__C.TRAIN.WARM_UP_EPOCHS = 2

# 初始化模型
__C.TRAIN.INITIAL_WEIGHT = os.path.join(__C.COMMON.RELATIVE_PATH, "checkpoint/val_loss=4.4647.ckpt-5")
# 训练日志
__C.TRAIN.TRAIN_LOG = os.path.join(__C.COMMON.RELATIVE_PATH, "log/train_log")
# 验证日志
__C.TRAIN.VAL_LOG = os.path.join(__C.COMMON.RELATIVE_PATH, "log/val_log")

# FREEZE MODEL
__C.FREEZE = edict()
# ckpt 模型文件夹
__C.FREEZE.CKPT_MODEL_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "checkpoint/val_loss=4.4647.ckpt-5")
# pb 模型文件夹
__C.FREEZE.PB_MODEL_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "model_info/val_loss=4.4647.pb")
# YOLOV3 节点输出
__C.FREEZE.YOLO_OUTPUT_NODE_NAME = ["input/input_data",
                                    "pred_sbbox/concat_2",
                                    "pred_mbbox/concat_2",
                                    "pred_lbbox/concat_2"
                                    ]


# TEST options
__C.TEST = edict()
# 测试数据集
__C.TEST.TEST_DATA_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "infos/dataset/voc_test.txt")
# 测试 .pb 模型 文件路径 yolov3_model
__C.TEST.TEST_PB_MODEL_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "model_info/val_loss=4.4647.pb")
# test 输入尺度
__C.TEST.INPUT_SIZE = 544
# 输出 图像 文件夹
__C.TEST.OUTPUT_IMAGE_FILE = os.path.join(__C.COMMON.RELATIVE_PATH, "output/test_image")
# 输出 预测框信息 文件夹
__C.TEST.OUTPUT_BOX_INFO_FILE = os.path.join(__C.COMMON.RELATIVE_PATH, "output/test_box_info")
# 是否对预测打框后的图像进行保存，默认保存 True
__C.TEST.SAVE_BOXES_IMAGE_FLAG = True

__C.TEST.RETURN_ELEMENTS = ["input/input_data:0",
                            "pred_sbbox/concat_2:0",
                            "pred_mbbox/concat_2:0",
                            "pred_lbbox/concat_2:0"
                            ]

__C.TEST.VEDIO_PATH = os.path.join(__C.COMMON.RELATIVE_PATH, "data/video/test_video.mp4")

2. 准备训练、验证、测试数据

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/03/15 02:59
# @Author   : WanDaoYi
# @FileName : prepare.py
# ============================================

import os
import random
from datetime import datetime
import xml.etree.ElementTree as ET
import core.utils as utils
from config import cfg


class Prepare(object):

    def __init__(self):
        # 图像路径
        self.image_path = cfg.COMMON.IMAGE_PATH
        # 图像的后缀名
        self.image_extension = cfg.COMMON.IMAGE_EXTENSION
        # xml 路径
        self.annotation_path = cfg.COMMON.ANNOTATION_PATH
        # 获取 c 类 字典型
        self.classes_dir = utils.read_class_names(cfg.COMMON.CLASS_FILE_PATH)
        self.classes_len = len(self.classes_dir)
        # 获取 c 类 list 型
        self.classes_list = [self.classes_dir[key] for key in range(self.classes_len)]

        # 数据的百分比
        self.test_percent = cfg.COMMON.TEST_PERCENT
        self.val_percent = cfg.COMMON.VAL_PERCENT

        # 各成分数据保存路径
        self.train_data_path = cfg.TRAIN.TRAIN_DATA_PATH
        self.val_data_path = cfg.TRAIN.VAL_DATA_PATH
        self.test_data_path = cfg.TEST.TEST_DATA_PATH

        pass

    def do_prepare(self):

        xml_file_list = os.listdir(self.annotation_path)
        xml_len = len(xml_file_list)
        # 根据百分比得到各成分 数据量
        n_test = int(xml_len * self.test_percent)
        n_val = int(xml_len * self.val_percent)
        n_train = xml_len - n_test - n_val

        if os.path.exists(self.train_data_path):
            os.remove(self.train_data_path)
            pass

        if os.path.exists(self.val_data_path):
            os.remove(self.val_data_path)
            pass

        if os.path.exists(self.test_data_path):
            os.remove(self.test_data_path)
            pass

        # 随机划分数据
        n_train_val = n_train + n_val
        train_val_list = random.sample(xml_file_list, n_train_val)
        train_list = random.sample(train_val_list, n_train)

        train_file = open(self.train_data_path, "w")
        val_file = open(self.val_data_path, "w")
        test_file = open(self.test_data_path, "w")

        for xml_name in xml_file_list:
            # 名字信息
            name_info = xml_name[: -4]
            # 图像名
            image_name = name_info + self.image_extension

            # 如果文件名在 训练 和 验证 文件划分中
            if xml_name in train_val_list:
                # 如果文件名在 训练数据划分中
                if xml_name in train_list:
                    self.convert_annotation(xml_name, image_name, train_file)
                    train_file.write('\n')
                    pass
                # 否则文件在 验证 文件
                else:
                    self.convert_annotation(xml_name, image_name, val_file)
                    val_file.write('\n')
                    pass
                pass
            # 否则文件名在 测试 文件
            else:
                self.convert_annotation(xml_name, image_name, test_file)
                test_file.write('\n')
                pass

        pass

    def convert_annotation(self, xml_name, image_name, file):
        xml_path = os.path.join(self.annotation_path, xml_name)
        image_path = os.path.join(self.image_path, image_name)
        file.write(image_path)

        # 打开 xml 文件
        xml_file = open(xml_path)
        # 将 xml 文件 转为树状结构
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for obj in root.iter("object"):
            diff = obj.find("difficult").text
            cls = obj.find("name").text
            if cls not in self.classes_list or int(diff) == 1:
                continue

            cls_id = self.classes_list.index(cls)
            xml_box = obj.find("bndbox")
            b = (int(xml_box.find('xmin').text), int(xml_box.find('ymin').text),
                 int(xml_box.find('xmax').text), int(xml_box.find('ymax').text))
            file.write(" " + ",".join([str(a) for a in b]) + ',' + str(cls_id))
            pass
        pass


if __name__ == "__main__":
    # 代码开始时间
    start_time = datetime.now()
    print("开始时间: {}".format(start_time))

    demo = Prepare()
    demo.do_prepare()

    # 代码结束时间
    end_time = datetime.now()
    print("结束时间: {}, 训练模型耗时: {}".format(end_time, end_time - start_time))
    pass

3. 构建 yolo-v3 网络

(1). 首先，当然是网络模块工具咯 common.py

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/03/07 14:09
# @Author   : WanDaoYi
# @FileName : common.py
# ============================================


import tensorflow as tf


# 卷积
def convolutional(input_data, filters_shape, training_flag, name, down_sample=False, activate=True, bn=True):
    """
    :param input_data: 输入信息
    :param filters_shape: 卷积核的形状，如 (3, 3, 32, 64) 表示：3 x 3 大小的卷积核，输入32维，输出64维
    :param training_flag: 是否是在训练模式下返回输出
    :param name: 卷积的名称
    :param down_sample: 是否下采样，默认不下采样
    :param activate: 是否使用 ReLU 激活函数
    :param bn: 是否进行 BN 处理
    :return:
    """
    with tf.variable_scope(name):
        # 下采样
        if down_sample:
            pad_h, pad_w = (filters_shape[0] - 2) // 2 + 1, (filters_shape[1] - 2) // 2 + 1
            paddings = tf.constant([[0, 0], [pad_h, pad_h], [pad_w, pad_w], [0, 0]])
            input_data = tf.pad(input_data, paddings, 'CONSTANT')
            strides = (1, 2, 2, 1)
            padding = 'VALID'
        # 不下采样
        else:
            strides = (1, 1, 1, 1)
            padding = "SAME"

        weight = tf.get_variable(name='weight', dtype=tf.float32, trainable=True,
                                 shape=filters_shape, initializer=tf.random_normal_initializer(stddev=0.01))
        # 卷积操作
        conv = tf.nn.conv2d(input=input_data, filter=weight, strides=strides, padding=padding)
        # BN 处理
        if bn:
            conv = tf.layers.batch_normalization(conv,
                                                 beta_initializer=tf.zeros_initializer(),
                                                 gamma_initializer=tf.ones_initializer(),
                                                 moving_mean_initializer=tf.zeros_initializer(),
                                                 moving_variance_initializer=tf.ones_initializer(),
                                                 training=training_flag)
        # 添加 bias
        else:
            bias = tf.get_variable(name='bias', shape=filters_shape[-1], trainable=True,
                                   dtype=tf.float32, initializer=tf.constant_initializer(0.0))
            conv = tf.nn.bias_add(conv, bias)

        # 激活函数处理
        if activate:
            conv = tf.nn.leaky_relu(conv, alpha=0.1)

    return conv


# 残差模块
def residual_block(input_data, input_channel, filter_num1, filter_num2, training_flag, name):
    """
    :param input_data: 输入的 feature maps
    :param input_channel: 输入的 通道
    :param filter_num1: 卷积核数
    :param filter_num2: 卷积核数
    :param training_flag: 是否是在训练模式下返回输出
    :param name:
    :return:
    """

    # 用来做短路连接的 feature maps
    short_cut = input_data

    with tf.variable_scope(name):
        input_data = convolutional(input_data, filters_shape=(1, 1, input_channel, filter_num1),
                                   training_flag=training_flag, name='conv1')
        input_data = convolutional(input_data, filters_shape=(3, 3, filter_num1, filter_num2),
                                   training_flag=training_flag, name='conv2')
        # 残差值和短路值相加，得到残差模块
        residual_output = input_data + short_cut

    return residual_output


# concat 操作
def route(name, previous_output, current_output):
    with tf.variable_scope(name):
        concat_output = tf.concat([current_output, previous_output], axis=-1)

    return concat_output


# 上采样
def up_sample(input_data, name, method="deconv"):
    assert method in ["resize", "deconv"]

    if method == "resize":
        with tf.variable_scope(name):
            input_shape = tf.shape(input_data)
            up_sample_output = tf.image.resize_nearest_neighbor(input_data, (input_shape[1] * 2, input_shape[2] * 2))
        pass
    else:
        # 输入 filter 的数量
        filter_num = input_data.shape.as_list()[-1]
        up_sample_output = tf.layers.conv2d_transpose(input_data, filter_num, kernel_size=2,
                                                      padding='same', strides=(2, 2),
                                                      kernel_initializer=tf.random_normal_initializer())
        pass

    return up_sample_output

(2). 其次，还是一些工具函数。毕竟，工欲善其事必先利其器嘛 utils.py 精彩出场

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/03/07 14:10
# @Author   : WanDaoYi
# @FileName : utils.py
# ============================================

import cv2
import random
import colorsys
import numpy as np
import tensorflow as tf
from config import cfg


# 获取类别
def read_class_names(class_file_name):
    """
    :param class_file_name: class 文件路径
    :return:
    """
    names = {}
    with open(class_file_name, 'r') as data:
        # 获取类名和下标，用于数值和类之间的转换
        for ID, name in enumerate(data):
            names[ID] = name.strip('\n')
    return names


# 获取 anchor
def get_anchors(anchors_path):
    """
    :param anchors_path: anchor 文件路径
    :return:
    """
    with open(anchors_path) as f:
        anchors = f.readline()
    anchors = np.array(anchors.split(','), dtype=np.float32)
    # anchors = anchors.reshape(-1, 2)
    anchors = anchors.reshape(3, 3, 2)
    # print("anchors.reshape:{}".format(anchors))
    return anchors


# 读取 image data 的路径
def read_data_path(file_path):

    image_path_list = []
    with open(file_path) as file:
        line_list = file.readlines()
        for line_info in line_list:
            data_info = line_info.strip()
            image_path = data_info.split()[0]
            image_path_list.append(image_path)
            pass
        pass

    return image_path_list
    pass


# 读取数据
def read_data_line(file_path):
    data_line_list = []
    with open(file_path) as file:
        line_list = file.readlines()
        for line_info in line_list:
            data_info = line_info.strip().split()
            data_line_list.append(data_info)

    return data_line_list


# 图像预处理
def image_preporcess(image, target_size, gt_boxes=None):
    """
    :param image: 图像信息
    :param target_size: 目标尺寸
    :param gt_boxes: ground truth
    :return:
    """
    # 颜色空间转换
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB).astype(np.float32)

    # 目标边长 比如 416 x 416
    ih, iw = target_size
    # 图像的真实边长 比如 632 x 900
    h, w, _ = image.shape

    # 将图像resize 成 最大边 为目标的边长，另一个边小于或等于目标边长
    # 以保障最大的边没超出对应的边长。
    # 通过上面的比如值，计算得到 0.4622222222222222
    scale = min(iw / w, ih / h)
    # 得到新的边长为 416
    nw = int(scale * w)
    # 得到新的边长为 292
    nh = int(scale * h)
    # 将上面的真实图像 resize 到 416 x 292 大小
    image_resized = cv2.resize(image, (nw, nh))

    # 创建一张 416 x 416 大小的三通道图，并全部填充 128 值
    image_paded = np.full(shape=[ih, iw, 3], fill_value=128.0)
    # 为了将上面的 resize 的真实图像放置在 目标图像 416 x 416 大小的中间
    # 必须先根据 resize 的真实图像的边长，计算出与 目标图像 边长的差距
    # 然后这些差距 除以 2，就在 目标图像 416 x 416 的中间了
    # 根据上面的比如值 计算得 0
    dw = (iw - nw) // 2
    # 根据上面的比如值 计算得 62
    dh = (ih - nh) // 2
    # 将 resize 值放置 目标图像中间
    # 比如值得 [62: 292 + 62, 0: 416 + 0, :]
    # 这样，得到的图像 3 个通道中，h 上面和下面分别有 62 个格子是空白的
    # 确保了真实图像放缩到了目标图像的中间位置。这一步对卷积提取特征非常重要。
    image_paded[dh: nh + dh, dw: nw + dw, :] = image_resized
    # 对得到的图像进行归一化处理
    image_paded = image_paded / 255.

    if gt_boxes is None:
        return image_paded

    else:
        # 真实图像进行了缩放，ground truth 也相对的进行 缩放映射
        gt_boxes[:, [0, 2]] = gt_boxes[:, [0, 2]] * scale + dw
        gt_boxes[:, [1, 3]] = gt_boxes[:, [1, 3]] * scale + dh
        return image_paded, gt_boxes


# 打框
def draw_bbox(image, bboxes, classes=None, show_label=True):
    """
    :param image:
    :param bboxes: [x_min, y_min, x_max, y_max, probability, cls_id] format coordinates.
    :param classes: index and class dic
    :param show_label:
    :return:
    """
    # 获取 class 的 index 和 name
    if classes is None:
        classes = read_class_names(cfg.COMMON.CLASS_FILE_PATH)
        pass

    class_num = len(classes)
    image_h, image_w, _ = image.shape

    # 生成 与 类别数量的 3 通道 hsv 颜色元组
    hsv_tuples = [(1.0 * x / class_num, 1., 1.) for x in range(class_num)]
    # 颜色空间变换，将 hsv 转 rgb
    colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
    # 将归一化的颜色值还原到 0~255 之间
    colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), colors))

    # 设置随机种子
    random.seed(32)
    # 打框颜色 洗牌
    random.shuffle(colors)
    random.seed(None)

    for i, bbox in enumerate(bboxes):
        # 获取 bbox 的坐标值
        coor = np.array(bbox[:4], dtype=np.int32)
        font_scale = 0.5
        # 获取得分
        score = bbox[4]
        # 获取类别的 index
        class_ind = int(bbox[5])
        # 根据类别的 index 获取 bbox 打框的颜色
        bbox_color = colors[class_ind]
        # bbox 打框 的线条宽度
        bbox_thick = int(0.6 * (image_h + image_w) / 600)
        # 获取左上角和右下角坐标
        c1 = (coor[0], coor[1])
        c2 = (coor[2], coor[3])
        # 打框
        cv2.rectangle(image, c1, c2, bbox_color, bbox_thick)

        # 对 label 进行打框标注
        if show_label:
            # 根据 index 获取 class 的名字
            bbox_mess = '%s: %.2f' % (classes[class_ind], score)
            t_size = cv2.getTextSize(bbox_mess, 0, font_scale, thickness=bbox_thick // 2)[0]
            cv2.rectangle(image, c1, (c1[0] + t_size[0], c1[1] - t_size[1] - 3), bbox_color, -1)  # filled

            cv2.putText(image, bbox_mess, (c1[0], c1[1] - 2), cv2.FONT_HERSHEY_SIMPLEX,
                        font_scale, (0, 0, 0), bbox_thick // 2, lineType=cv2.LINE_AA)

    return image


# (x, y, w, h) --> (xmin, ymin, xmax, ymax)
def bbox_xywh_dxdy(boxes):
    boxes = np.array(boxes)
    boxes_dxdy = np.concatenate([boxes[..., :2] - boxes[..., 2:] * 0.5,
                                 boxes[..., :2] + boxes[..., 2:] * 0.5], axis=-1)

    return boxes_dxdy
    pass


# (xmin, ymin, xmax, ymax) --> (x, y, w, h)
def bbox_dxdy_xywh(boxes):
    boxes = np.array(boxes)
    # 转换 [xmin, ymin, xmax, ymax] --> [x, y, w, h] bounding boxes 结构
    bbox_xywh = np.concatenate([(boxes[2:] + boxes[:2]) * 0.5,
                                boxes[2:] - boxes[:2]], axis=-1)

    return bbox_xywh
    pass


# IOU
def bboxes_iou(boxes1, boxes2):
    boxes1 = np.array(boxes1)
    boxes2 = np.array(boxes2)

    # 计算 面积
    boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1])
    boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1])

    # 交集的 左上角坐标
    left_up = np.maximum(boxes1[..., :2], boxes2[..., :2])
    # 交集的 右下角坐标
    right_down = np.minimum(boxes1[..., 2:], boxes2[..., 2:])

    # 计算交集矩形框的 high 和 width
    inter_section = np.maximum(right_down - left_up, 0.0)

    # 两个矩形框的 交集 面积
    inter_area = inter_section[..., 0] * inter_section[..., 1]

    # 两个矩形框的并集面积
    union_area = boxes1_area + boxes2_area - inter_area
    # 计算 iou
    ious = np.maximum(1.0 * inter_area / union_area, np.finfo(np.float32).eps)

    return ious


# 获取 .pb 模型
def read_pb_return_tensors(graph, pb_file, return_elements):
    with tf.gfile.FastGFile(pb_file, 'rb') as f:
        frozen_graph_def = tf.GraphDef()
        frozen_graph_def.ParseFromString(f.read())

    with graph.as_default():
        return_elements = tf.import_graph_def(frozen_graph_def,
                                              return_elements=return_elements)
    return return_elements


# 非极大值抑制
def nms(bboxes, iou_threshold, sigma=0.3, method='nms'):
    """
    :param bboxes: (xmin, ymin, xmax, ymax, score, class)
    :param iou_threshold: iou 阈值
    :param sigma:
    :param method: 方法
    :return:
    """
    # 获取 bbox 中类别种类的 list
    classes_in_img = list(set(bboxes[:, 5]))
    best_bboxes = []

    for cls in classes_in_img:
        # 构建一个 bbox batch size 大小的 list，
        # 类别与 cls 相同的为 list内容为 True，不同的额外 False
        # 比如 bboxes = [[12, 12, 94, 94, 0.78, 0],
        #                [34, 34, 64, 64, 0.88, 1],
        #                [78, 78, 124, 124, 0.98, 0],
        #                [52, 52, 74, 74, 0.78, 1]
        #               ]
        # 第一次遍历得到 [True, False, True, False] 这样的 cls_mask list
        # 第二次遍历得到 [False, True, False, True] 这样的 cls_mask list
        cls_mask = (bboxes[:, 5] == cls)
        # 第一次遍历得到 [[12, 12, 94, 94, 0.78, 0], [78, 78, 124, 124, 0.98, 0]] 这样的 cls_bboxes list
        # 第二次遍历得到 [[34, 34, 64, 64, 0.88, 1], [52, 52, 74, 74, 0.78, 1]] 这样的 cls_bboxes list
        cls_bboxes = bboxes[cls_mask]

        while len(cls_bboxes) > 0:
            # 获取最大概率值的下标
            max_ind = np.argmax(cls_bboxes[:, 4])
            # 概率值最大的  bbox 为最佳 bbox
            best_bbox = cls_bboxes[max_ind]
            # 将所有 最好的 bbox 放到一个 list 中
            best_bboxes.append(best_bbox)
            # 将概率最大的那个 bbox 移除后 剩下的 bboxes
            cls_bboxes = np.concatenate([cls_bboxes[: max_ind], cls_bboxes[max_ind + 1:]])
            # 计算 best bbox 与剩下的 bbox 之间的 iou
            iou = bboxes_iou(best_bbox[np.newaxis, :4], cls_bboxes[:, :4])
            # 构建一个 长度为 len(iou) 的 list，并填充 1 值
            weight = np.ones((len(iou),), dtype=np.float32)

            assert method in ['nms', 'soft-nms']

            if method == 'nms':
                # 将大于阈值的 iou，其对应 list 的值设置为 0，用于下面对该值进行移除
                iou_mask = iou > iou_threshold
                weight[iou_mask] = 0.0

            if method == 'soft-nms':
                weight = np.exp(-(1.0 * iou ** 2 / sigma))

            # 移除 大于阈值 的 bboxes，如此重复，直至 cls_bboxes 为空
            # 将大于阈值的 bbox 概率设置为 零值
            cls_bboxes[:, 4] = cls_bboxes[:, 4] * weight
            # 保留概率 大于 零值 的 bbox
            score_mask = cls_bboxes[:, 4] > 0.
            cls_bboxes = cls_bboxes[score_mask]

    return best_bboxes


# 处理后的盒子
def postprocess_boxes(pred_bbox, org_img_shape, input_size, score_threshold):
    """
    :param pred_bbox: 预测的 bbox
    :param org_img_shape: 原始图像的 shape
    :param input_size: 输入的大小
    :param score_threshold: 得分阈值
    :return:
    """
    valid_scale = [0, np.inf]
    pred_bbox = np.array(pred_bbox)

    # bbox 坐标
    pred_xywh = pred_bbox[:, 0:4]
    # bbox 置信度
    pred_conf = pred_bbox[:, 4]
    # bbox 概率
    pred_prob = pred_bbox[:, 5:]

    # (1) (x, y, w, h) --> (xmin, ymin, xmax, ymax)
    pred_coor = np.concatenate([pred_xywh[:, :2] - pred_xywh[:, 2:] * 0.5,
                                pred_xywh[:, :2] + pred_xywh[:, 2:] * 0.5], axis=-1)

    # (2) (xmin, ymin, xmax, ymax) -> (xmin_org, ymin_org, xmax_org, ymax_org)
    org_h, org_w = org_img_shape
    resize_ratio = min(input_size / org_w, input_size / org_h)

    dw = (input_size - resize_ratio * org_w) / 2
    dh = (input_size - resize_ratio * org_h) / 2

    # 将预测的 x 的坐标(xmin, xmax) pred_coor[:, 0::2] 减去空白区域 dw 后，
    # 除以缩放比率，得到原图 x 方向的大小
    pred_coor[:, 0::2] = 1.0 * (pred_coor[:, 0::2] - dw) / resize_ratio
    # 将预测的 y 的坐标(ymin, ymax) pred_coor[:, 1::2] 减去空白区域 dh 后，
    # 除以缩放比率，得到原图 y 方向的大小
    pred_coor[:, 1::2] = 1.0 * (pred_coor[:, 1::2] - dh) / resize_ratio

    # (3) clip some boxes those are out of range 处理那些超出原图大小范围的 bboxes
    pred_coor = np.concatenate([np.maximum(pred_coor[:, :2], [0, 0]),
                                np.minimum(pred_coor[:, 2:], [org_w - 1, org_h - 1])], axis=-1)

    # 处理不正常的 bbox
    invalid_mask = np.logical_or((pred_coor[:, 0] > pred_coor[:, 2]), (pred_coor[:, 1] > pred_coor[:, 3]))
    pred_coor[invalid_mask] = 0

    # (4) discard some invalid boxes 丢弃无效的 bbox
    bboxes_scale = np.sqrt(np.multiply.reduce(pred_coor[:, 2:4] - pred_coor[:, 0:2], axis=-1))
    # np 的 逻辑 and
    scale_mask = np.logical_and((valid_scale[0] < bboxes_scale), (bboxes_scale < valid_scale[1]))

    # (5) discard some boxes with low scores 丢弃分值过低的 bbox
    classes = np.argmax(pred_prob, axis=-1)
    scores = pred_conf * pred_prob[np.arange(len(pred_coor)), classes]
    score_mask = scores > score_threshold
    mask = np.logical_and(scale_mask, score_mask)
    coors, scores, classes = pred_coor[mask], scores[mask], classes[mask]

    return np.concatenate([coors, scores[:, np.newaxis], classes[:, np.newaxis]], axis=-1)

(3). 接着，就是 darknet53 背骨模型闪亮亮相 backbone.py

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/03/07 14:10
# @Author   : WanDaoYi
# @FileName : backbone.py
# ============================================


import core.common as common
import tensorflow as tf


def darknet53(input_data, training_flag):
    with tf.variable_scope('darknet'):

        input_data = common.convolutional(input_data, filters_shape=(3, 3, 3, 32), 
                                          training_flag=training_flag, name='conv0')
        input_data = common.convolutional(input_data, filters_shape=(3, 3, 32, 64),
                                          training_flag=training_flag, name='conv1', down_sample=True)

        for i in range(1):
            input_data = common.residual_block(input_data, 64, 32, 64, 
                                               training_flag=training_flag, name='residual%d' % (i + 0))

        input_data = common.convolutional(input_data, filters_shape=(3, 3, 64, 128),
                                          training_flag=training_flag, name='conv4', down_sample=True)

        for i in range(2):
            input_data = common.residual_block(input_data, 128, 64, 128, training_flag=training_flag,
                                               name='residual%d' % (i + 1))

        input_data = common.convolutional(input_data, filters_shape=(3, 3, 128, 256),
                                          training_flag=training_flag, name='conv9', down_sample=True)

        for i in range(8):
            input_data = common.residual_block(input_data, 256, 128, 256, training_flag=training_flag,
                                               name='residual%d' % (i + 3))

        route_1 = input_data
        input_data = common.convolutional(input_data, filters_shape=(3, 3, 256, 512),
                                          training_flag=training_flag, name='conv26', down_sample=True)

        for i in range(8):
            input_data = common.residual_block(input_data, 512, 256, 512, training_flag=training_flag,
                                               name='residual%d' % (i + 11))

        route_2 = input_data
        input_data = common.convolutional(input_data, filters_shape=(3, 3, 512, 1024),
                                          training_flag=training_flag, name='conv43', down_sample=True)

        for i in range(4):
            input_data = common.residual_block(input_data, 1024, 512, 1024, training_flag=training_flag,
                                               name='residual%d' % (i + 19))

        return route_1, route_2, input_data

(4). 有了前面的基础，就可以构建 yolo-v3 网络模型了 yolov3.py

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/03/07 14:10
# @Author   : WanDaoYi
# @FileName : yolov3.py
# ============================================

import numpy as np
import tensorflow as tf
import core.utils as utils
import core.common as common
import core.backbone as backbone
from config import cfg


class YoloV3(object):

    def __init__(self, input_data, training_flag):

        # 超参
        self.alpha = cfg.COMMON.ALPHA
        self.gamma = cfg.COMMON.GAMMA

        # 是否是在训练模式下返回输出 True 为返回
        self.training_flag = training_flag
        # 获取 classes
        self.classes = utils.read_class_names(cfg.COMMON.CLASS_FILE_PATH)
        # 获取 classes 的种类数
        self.num_class = len(self.classes)
        # 获取 YOLOV3 的 3 个尺度
        self.strides = np.array(cfg.YOLO.STRIDES)
        # 获取 anchors
        self.anchors = utils.get_anchors(cfg.COMMON.ANCHOR_FILE_PATH)
        # 每个尺度有 3 个 anchors
        self.anchor_per_scale = cfg.YOLO.ANCHOR_PER_SCALE
        # iou 损失的 阈值
        self.iou_loss_thresh = cfg.COMMON.IOU_LOSS_THRESH
        # 上采样的方式
        self.up_sample_method = cfg.YOLO.UP_SAMPLE_METHOD

        try:
            # 获取 yolov3 网络 小、中、大 三个尺度的 feature maps
            self.conv_lbbox, self.conv_mbbox, self.conv_sbbox = self.structure_network(input_data)
            pass
        except:
            raise NotImplementedError("Can not structure yolov3 network!")
            pass

        # anchors list [[116.,  90.], [156., 198.], [373., 326.]],
        # 小尺度的 feature maps 使用 大尺度的 anchors 值，用于检测大尺度目标
        with tf.variable_scope('pred_lbbox'):
            self.pred_lbbox = self.pred_conv_bbox(self.conv_lbbox, self.anchors[2], self.strides[2])
            pass

        # anchors list [[30.,  61.], [62., 45.], [59., 119.]], 用于检测中等尺度目标
        with tf.variable_scope('pred_mbbox'):
            self.pred_mbbox = self.pred_conv_bbox(self.conv_mbbox, self.anchors[1], self.strides[1])
            pass

        # anchors list [[10.,  13.], [16., 30.], [33., 23.]],
        # 大尺度的 feature maps 使用 相对小点尺度的 anchors 值，用于检测小尺度目标
        with tf.variable_scope('pred_sbbox'):
            self.pred_sbbox = self.pred_conv_bbox(self.conv_sbbox, self.anchors[0], self.strides[0])
            pass
        pass

    # 构建网络
    def structure_network(self, input_data):
        # 调取 darknet53 网络，获取 3 个尺度的返回值
        route_1, route_2, input_data = backbone.darknet53(input_data, self.training_flag)

        # conv set 操作 conv: 1 x 1 -> 3 x 3 -> 1 x 1 -> 3 x 3 -> 1 x 1
        input_data = common.convolutional(input_data, (1, 1, 1024, 512), self.training_flag, 'conv52')
        input_data = common.convolutional(input_data, (3, 3, 512, 1024), self.training_flag, 'conv53')
        input_data = common.convolutional(input_data, (1, 1, 1024, 512), self.training_flag, 'conv54')
        input_data = common.convolutional(input_data, (3, 3, 512, 1024), self.training_flag, 'conv55')
        input_data = common.convolutional(input_data, (1, 1, 1024, 512), self.training_flag, 'conv56')

        # scale one 小尺度的 feature maps: conv set -> conv 3 x 3 -> 1 x 1
        conv_lobj_branch = common.convolutional(input_data, (3, 3, 512, 1024),
                                                self.training_flag, name='conv_lobj_branch')
        # 3 * (self.num_class + 5) 表示为: 每个尺度有 3 个 anchors，每个 anchor 有 5 + num_class 个值
        # 5 = 4 + 1, 4个 坐标值，1 个置信度，num_class 表示分类的数量
        conv_lbbox = common.convolutional(conv_lobj_branch, (1, 1, 1024, 3 * (self.num_class + 5)),
                                          training_flag=self.training_flag, name='conv_lbbox',
                                          activate=False, bn=False)

        # 进入另一个分支点: conv set -> 1 x 1 -> up_sampling -> concatenate
        input_data = common.convolutional(input_data, (1, 1, 512, 256), self.training_flag, 'conv57')
        input_data = common.up_sample(input_data, name='up_sample0', method=self.up_sample_method)

        with tf.variable_scope('route_1'):
            input_data = tf.concat([input_data, route_2], axis=-1)

        # conv set 操作 conv: 1 x 1 -> 3 x 3 -> 1 x 1 -> 3 x 3 -> 1 x 1
        input_data = common.convolutional(input_data, (1, 1, 768, 256), self.training_flag, 'conv58')
        input_data = common.convolutional(input_data, (3, 3, 256, 512), self.training_flag, 'conv59')
        input_data = common.convolutional(input_data, (1, 1, 512, 256), self.training_flag, 'conv60')
        input_data = common.convolutional(input_data, (3, 3, 256, 512), self.training_flag, 'conv61')
        input_data = common.convolutional(input_data, (1, 1, 512, 256), self.training_flag, 'conv62')

        # scale two 中尺度的 feature maps: conv set -> conv 3 x 3 -> 1 x 1
        conv_mobj_branch = common.convolutional(input_data, (3, 3, 256, 512),
                                                self.training_flag, name='conv_mobj_branch')
        conv_mbbox = common.convolutional(conv_mobj_branch, (1, 1, 512, 3 * (self.num_class + 5)),
                                          training_flag=self.training_flag, name='conv_mbbox',
                                          activate=False, bn=False)

        # 进入另一个分支点: conv set -> 1 x 1 -> up_sampling -> concatenate
        input_data = common.convolutional(input_data, (1, 1, 256, 128), self.training_flag, 'conv63')
        input_data = common.up_sample(input_data, name='up_sample1', method=self.up_sample_method)

        with tf.variable_scope('route_2'):
            input_data = tf.concat([input_data, route_1], axis=-1)

        # conv set 操作 conv: 1 x 1 -> 3 x 3 -> 1 x 1 -> 3 x 3 -> 1 x 1
        input_data = common.convolutional(input_data, (1, 1, 384, 128), self.training_flag, 'conv64')
        input_data = common.convolutional(input_data, (3, 3, 128, 256), self.training_flag, 'conv65')
        input_data = common.convolutional(input_data, (1, 1, 256, 128), self.training_flag, 'conv66')
        input_data = common.convolutional(input_data, (3, 3, 128, 256), self.training_flag, 'conv67')
        input_data = common.convolutional(input_data, (1, 1, 256, 128), self.training_flag, 'conv68')

        # scale three 大尺度的 feature maps: conv set -> conv 3 x 3 -> 1 x 1
        conv_sobj_branch = common.convolutional(input_data, (3, 3, 128, 256),
                                                self.training_flag, name='conv_sobj_branch')
        conv_sbbox = common.convolutional(conv_sobj_branch, (1, 1, 256, 3 * (self.num_class + 5)),
                                          training_flag=self.training_flag, name='conv_sbbox',
                                          activate=False, bn=False)

        # 将 3 个尺度 小、中、大 的 feature maps 类似于 13 x 13 大小的那种 feature maps
        return conv_lbbox, conv_mbbox, conv_sbbox
        pass

    # 对 yolov3 网络 3 个尺度的 feature maps 进行预测
    def pred_conv_bbox(self, conv_bbox, anchors, strides):
        """
        :param conv_bbox: yolov3 network 返回的 feature maps
        :param anchors: anchors 例如: [[10., 13.], [16., 30.], [33., 23.]],
                        [10., 13.] 用来表示 anchor box 为 10 x 13 大小的先验值
        :param strides: 缩放步幅的尺度 例如: [8, 16, 32] 中的一个值，(原图 416 x 416)
                        如 32 表示使用 步幅为 32 的尺度进行操作，得到 13 x 13 大小 的feature maps，
                        相当于 缩放为 原图的 1/32 大小，另外的 8 和 16 的操作同理。
        :return:
        """
        # 获取 conv_bbox 的形状结构
        conv_bbox_shape = tf.shape(conv_bbox)
        # 获取 conv_bbox 的批量大小
        batch_size = conv_bbox_shape[0]
        # 获取 conv_bbox 的大小
        conv_bbox_size = conv_bbox_shape[1]
        # 获取每个尺度 anchors 的数量
        anchor_per_scale = len(anchors)

        # 将 conv_bbox 构建 目标张量，方便取值
        conv_obj = tf.reshape(conv_bbox, (batch_size, conv_bbox_size,
                                          conv_bbox_size, anchor_per_scale,
                                          5 + self.num_class))

        # 获取 中心点 坐标
        conv_raw_dxdy = conv_obj[:, :, :, :, 0:2]
        # 获取 width 和 high
        conv_raw_dwdh = conv_obj[:, :, :, :, 2:4]
        # 获取 置信度 即 前景或背景 的概率
        conv_raw_conf = conv_obj[:, :, :, :, 4:5]
        # 获取 c 类 对应的 概率值
        conv_raw_prob = conv_obj[:, :, :, :, 5:]

        # 张量操作, 构建一个 y 轴方向 (conv_bbox_size, conv_bbox_size) 大小的 张量,
        # 并填入对应的正数值，用来表示它的绝对位置
        y = tf.tile(tf.range(conv_bbox_size, dtype=tf.int32)[:, tf.newaxis], [1, conv_bbox_size])
        # 张量操作, 构建一个 x 轴方向 (conv_bbox_size, conv_bbox_size) 大小的 张量,
        # 并填入对应的正数值，用来表示它的绝对位置
        x = tf.tile(tf.range(conv_bbox_size, dtype=tf.int32)[tf.newaxis, :], [conv_bbox_size, 1])

        # 将 (conv_bbox_size, conv_bbox_size) 大小的 张量 根据 通道数 cancat 起来,
        # 得到 (conv_bbox_size, conv_bbox_size, 2) 大小的张量, 这样，就得到对应 feature maps 每个格子的 绝对位置的数值
        xy_grid = tf.concat([x[:, :, tf.newaxis], y[:, :, tf.newaxis]], axis=-1)
        # 张量操作: 构建成 (batch_size, conv_bbox_size, conv_bbox_size, anchor_per_scale, 2) 结构
        xy_grid = tf.tile(xy_grid[tf.newaxis, :, :, tf.newaxis, :], [batch_size, 1, 1, anchor_per_scale, 1])
        # 将数据转为浮点型
        xy_grid = tf.cast(xy_grid, tf.float32)

        # 获取 x、y 预测值 映射到 原图 的中心点 位置 坐标; (偏移量 + 左上角坐标值) * 缩放值
        pred_xy = (tf.sigmoid(conv_raw_dxdy) + xy_grid) * strides
        # 获取 w、h 预测值 映射到 原图 的 width 和 high
        # 论文中的公式为: b_w = p_w * e ^ (t_w); 然后再乘以 缩放度，则映射回原图
        # p_w 为 先验 w 的大小，即为 anchor box 中 w 的大小。
        pred_wh = (tf.exp(conv_raw_dwdh) * anchors) * strides
        # 将 中心点 和 长 高 合并
        pred_xywh = tf.concat([pred_xy, pred_wh], axis=-1)

        # 计算置信度
        pred_conf = tf.sigmoid(conv_raw_conf)
        # 计算 c 类 概率
        pred_prob = tf.sigmoid(conv_raw_prob)

        # 返回  [batch_size, conv_bbox_size, conv_bbox_size, anchor_per_scale, 4 + 1 + class_num] 的 feature map
        # 4 + 1 + class_num 代表为: pred_xywh + pred_conf + pred_prob
        # 靠近 anchors 的 pred_conf 值为 1，远离的则 pred_conf 值为 0
        # 靠近 anchors 的 pred_prob 值接近 1，远离的则 pred_prob 值接近 0
        return tf.concat([pred_xywh, pred_conf, pred_prob], axis=-1)
        pass

    def compute_loss(self, label_sbbox, label_mbbox, label_lbbox, true_sbbox, true_mbbox, true_lbbox):
        """
        :param label_sbbox: label 相对应的信息 包含 5 + classes
        :param label_mbbox:
        :param label_lbbox:
        :param true_sbbox: 为 batch_size image 对应 strides 尺度的 ground truth boxes
                           [batch_size, ground_truth_num, xywh]; ground_truth_num 为每张图里面打有几个框
        :param true_mbbox:
        :param true_lbbox:
        :return:
        """

        # 分别计算三个尺度的损失函数
        with tf.name_scope('smaller_box_loss'):
            loss_sbbox = self.layer_loss(self.conv_sbbox, self.pred_sbbox,
                                         label_sbbox, true_sbbox,
                                         stride=self.strides[0])

        with tf.name_scope('medium_box_loss'):
            loss_mbbox = self.layer_loss(self.conv_mbbox, self.pred_mbbox,
                                         label_mbbox, true_mbbox,
                                         stride=self.strides[1])

        with tf.name_scope('bigger_box_loss'):
            loss_lbbox = self.layer_loss(self.conv_lbbox, self.pred_lbbox,
                                         label_lbbox, true_lbbox,
                                         stride=self.strides[2])

        # 对三个尺度的损失函数进行相加
        with tf.name_scope('giou_loss'):
            giou_loss = loss_sbbox[0] + loss_mbbox[0] + loss_lbbox[0]

        with tf.name_scope('conf_loss'):
            conf_loss = loss_sbbox[1] + loss_mbbox[1] + loss_lbbox[1]

        with tf.name_scope('prob_loss'):
            prob_loss = loss_sbbox[2] + loss_mbbox[2] + loss_lbbox[2]

        return giou_loss, conf_loss, prob_loss

    def layer_loss(self, conv_bbox, pred_bbox, label_bbox, true_bbox, stride):
        """
        :param conv_bbox: yolov3 网络得到的其中一个尺度的输出 feature maps
        :param pred_bbox: 对 一个尺度输出的 feature maps 预测值
        :param label_bbox: ground truth 对应的信息
        :param true_bbox: ground truth 对应 anchor 尺度下的真实 box 值
        :param stride: 缩放尺度 stride = [8, 16, 32] 中的一个值
        :return:
        """

        conv_shape = tf.shape(conv_bbox)
        batch_size = conv_shape[0]
        conv_bbox_size = conv_shape[1]
        input_size = stride * conv_bbox_size
        conv_bbox = tf.reshape(conv_bbox, (batch_size, conv_bbox_size, conv_bbox_size,
                                           self.anchor_per_scale, 5 + self.num_class))

        conv_raw_conf = conv_bbox[:, :, :, :, 4:5]
        conv_raw_prob = conv_bbox[:, :, :, :, 5:]

        # [batch_size, conv_bbox_size, conv_bbox_size, anchor_per_scale, 4 + 1 + class_num] 的 feature map
        pred_xywh = pred_bbox[:, :, :, :, 0:4]
        pred_conf = pred_bbox[:, :, :, :, 4:5]

        label_xywh = label_bbox[:, :, :, :, 0:4]
        respond_bbox = label_bbox[:, :, :, :, 4:5]
        label_prob = label_bbox[:, :, :, :, 5:]

        # 计算 预测框 与 label 框 的 GIOU
        giou = tf.expand_dims(self.bbox_giou(pred_xywh, label_xywh), axis=-1)
        input_size = tf.cast(input_size, tf.float32)

        # 计算 giou 的损失函数，在这里 使用 1 < bbox_loss_scale < 2 为 giou_loss 的惩罚系数
        # 当 bbox 相对于整张图像较小时，这时预测的准确率相对于较大的图像要小，需要用较大的 loss 来
        # 对 目标 训练的准确率进行调整。因为当 loss 很小，而准确率不高的情况下，
        # 是很难通过降低 loss 来调高 准确率的。而如 loss 相对大些，则容易通过降低 loss 来调高准确率。
        # 这个 1 < bbox_loss_scale < 2 也是作者通过试验，测出来较好的值
        bbox_loss_scale = 2.0 - 1.0 * label_xywh[:, :, :, :, 2:3] * label_xywh[:, :, :, :, 3:4] / (input_size ** 2)
        # 在这里乘上一个置信度，因为背景是没有 giou_loss 的
        giou_loss = respond_bbox * bbox_loss_scale * (1 - giou)

        # 预测框 和 ground truth box 的 iou
        # [batch_size, conv_bbox_size, conv_bbox_size, anchor_per_scale, ground_truth_box_num, xywh]
        # ground_truth_box_num: 表示一张图 打有 几个 框
        # 比如 pred_xywh 为 13 x 13 个格子，ground_truth_box_num 为 2。
        # 每个格子中的坐标 与 ground_truth_box_num 这两个框 的坐标 的 IOU 结果,
        # 这个 iou 用于 获取 下面 获取负样本 数
        iou = self.bbox_iou(pred_xywh[:, :, :, :, np.newaxis, :],
                            true_bbox[:, np.newaxis, np.newaxis, np.newaxis, :, :])

        # tf.reduce_max(iou, axis=-1) 获取最后一个维度 最大的 iou 值;
        # expand_dims 可以用来增加一个维度，比如 [1, 2, 3] --> [[1], [2], [3]]
        max_iou = tf.expand_dims(tf.reduce_max(iou, axis=-1), axis=-1)

        # 获取 负样本 系数
        respond_bgd = (1.0 - respond_bbox) * tf.cast(max_iou < self.iou_loss_thresh, tf.float32)

        # Focal loss: 为 交叉熵 的优化损失函数，减少 负样本 对损失函数对模型的影响
        # Focal_loss = -(respond_bbox - pred_conf) ^ gamma * log(pred_conf)
        # conf_focal 为 负样本 惩罚项系数
        conf_focal = self.alpha * tf.pow(tf.abs(respond_bbox - pred_conf), self.gamma)

        # respond_bbox 这里为正样本系数，因为它 的负样本 对应的值 为 0
        # respond_bgd 为负样本系数
        # 置信度损失函数
        conf_loss = conf_focal * (
                respond_bbox * tf.nn.sigmoid_cross_entropy_with_logits(labels=respond_bbox, logits=conv_raw_conf)
                +
                respond_bgd * tf.nn.sigmoid_cross_entropy_with_logits(labels=respond_bbox, logits=conv_raw_conf)
        )

        # c 类 概率损失函数
        prob_loss = respond_bbox * tf.nn.sigmoid_cross_entropy_with_logits(labels=label_prob, logits=conv_raw_prob)

        # 对各类损失函数累加再求均值
        giou_loss = tf.reduce_mean(tf.reduce_sum(giou_loss, axis=[1, 2, 3, 4]))
        conf_loss = tf.reduce_mean(tf.reduce_sum(conf_loss, axis=[1, 2, 3, 4]))
        prob_loss = tf.reduce_mean(tf.reduce_sum(prob_loss, axis=[1, 2, 3, 4]))

        return giou_loss, conf_loss, prob_loss

    # bounding boxes giou
    def bbox_giou(self, boxes1, boxes2):

        # (x, y, w, h) --> (xmin, ymin, xmax, ymax)
        boxes1 = tf.concat([boxes1[..., :2] - boxes1[..., 2:] * 0.5,
                            boxes1[..., :2] + boxes1[..., 2:] * 0.5], axis=-1)
        boxes2 = tf.concat([boxes2[..., :2] - boxes2[..., 2:] * 0.5,
                            boxes2[..., :2] + boxes2[..., 2:] * 0.5], axis=-1)

        # 获取 框 的 左上角 和 右下角 的坐标值
        boxes1 = tf.concat([tf.minimum(boxes1[..., :2], boxes1[..., 2:]),
                            tf.maximum(boxes1[..., :2], boxes1[..., 2:])], axis=-1)
        boxes2 = tf.concat([tf.minimum(boxes2[..., :2], boxes2[..., 2:]),
                            tf.maximum(boxes2[..., :2], boxes2[..., 2:])], axis=-1)

        # 计算 框 的面积
        boxes1_area = (boxes1[..., 2] - boxes1[..., 0]) * (boxes1[..., 3] - boxes1[..., 1])
        boxes2_area = (boxes2[..., 2] - boxes2[..., 0]) * (boxes2[..., 3] - boxes2[..., 1])

        # 计算交集的 左上角 和 右下角 坐标
        left_up = tf.maximum(boxes1[..., :2], boxes2[..., :2])
        right_down = tf.minimum(boxes1[..., 2:], boxes2[..., 2:])

        # 判断 两个 框 是否相交
        inter_section = tf.maximum(right_down - left_up, 0.0)
        # 计算 交集 的面积
        inter_area = inter_section[..., 0] * inter_section[..., 1]
        # 计算 并集 的面积
        union_area = boxes1_area + boxes2_area - inter_area
        # 计算 IOU
        iou = inter_area / union_area

        # 计算最小密封框 的 左上角 坐标
        enclose_left_up = tf.minimum(boxes1[..., :2], boxes2[..., :2])
        # 计算最小密封框 的 右下角 坐标
        enclose_right_down = tf.maximum(boxes1[..., 2:], boxes2[..., 2:])
        # 计算最小密封框 的 high 和 width
        enclose = tf.maximum(enclose_right_down - enclose_left_up, 0.0)
        # 计算最小密封框 的 面积
        enclose_area = enclose[..., 0] * enclose[..., 1]
        # 计算 GIOU
        giou = iou - 1.0 * (enclose_area - union_area) / enclose_area

        return giou

    # bounding boxes iou
    def bbox_iou(self, boxes1, boxes2):

        # 计算 框 的面积
        boxes1_area = boxes1[..., 2] * boxes1[..., 3]
        boxes2_area = boxes2[..., 2] * boxes2[..., 3]

        # (x, y, w, h) --> (xmin, ymin, xmax, ymax)
        boxes1 = tf.concat([boxes1[..., :2] - boxes1[..., 2:] * 0.5,
                            boxes1[..., :2] + boxes1[..., 2:] * 0.5], axis=-1)
        boxes2 = tf.concat([boxes2[..., :2] - boxes2[..., 2:] * 0.5,
                            boxes2[..., :2] + boxes2[..., 2:] * 0.5], axis=-1)

        # 计算交集的 左上角 和 右下角 坐标
        left_up = tf.maximum(boxes1[..., :2], boxes2[..., :2])
        right_down = tf.minimum(boxes1[..., 2:], boxes2[..., 2:])

        # 判断 两个 框 是否相交
        inter_section = tf.maximum(right_down - left_up, 0.0)
        # 计算 交集 的面积
        inter_area = inter_section[..., 0] * inter_section[..., 1]
        # 计算 并集 的面积
        union_area = boxes1_area + boxes2_area - inter_area
        # 计算 IOU
        iou = 1.0 * inter_area / union_area

        return iou

(4). 有了网络结构，我们还不能直接训练，因为，还缺乏对数据的操作，即，我们要如何对网络灌入数据，ground truth 又如何处理等问题，这时候，我们就需要 dataset.py 来为我们分工了。

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/03/07 14:10
# @Author   : WanDaoYi
# @FileName : dataset.py
# ============================================

import os
import cv2
import numpy as np
import tensorflow as tf
import core.utils as utils
from config import cfg


class Dataset(object):

    def __init__(self, train_flag=True):
        """
        :param train_flag: 是否是训练，默认训练
        """
        self.train_flag = train_flag

        # 训练数据
        if train_flag:
            self.data_file_path = cfg.TRAIN.TRAIN_DATA_PATH
            self.batch_size = cfg.TRAIN.TRAIN_BATCH_SIZE
            pass
        # 验证数据
        else:
            self.data_file_path = cfg.TRAIN.VAL_DATA_PATH
            self.batch_size = cfg.TRAIN.VAL_BATCH_SIZE
            pass

        self.train_input_size_list = cfg.TRAIN.INPUT_SIZE_LIST
        self.strides = np.array(cfg.YOLO.STRIDES)
        self.classes = utils.read_class_names(cfg.COMMON.CLASS_FILE_PATH)
        self.class_num = len(self.classes)
        self.anchor_list = utils.get_anchors(cfg.COMMON.ANCHOR_FILE_PATH)
        self.anchor_per_scale = cfg.YOLO.ANCHOR_PER_SCALE
        self.max_bbox_per_scale = cfg.COMMON.MAX_BBOX_PER_SCALE

        self.annotations = self.read_annotations()
        self.sample_num = len(self.annotations)
        self.batch_num = int(np.ceil(self.sample_num / self.batch_size))
        self.batch_count = 0
        pass

    # 迭代器
    def __iter__(self):
        return self

    # 使用迭代器 Dataset() 进行迭代，类似于 for 循环
    def __next__(self):
        with tf.device("/gpu:0"):
            # 从 train_input_size_list 中随机获取一个数值 作为 train_input_size
            self.train_input_size = np.random.choice(self.train_input_size_list)
            self.train_output_size = self.train_input_size // self.strides

            # 构建 输入图像 计算图
            batch_image = np.zeros((self.batch_size, self.train_input_size, self.train_input_size, 3))

            # 构建 3 个尺度预测图
            batch_label_sbbox = np.zeros((self.batch_size, self.train_output_size[0], self.train_output_size[0],
                                          self.anchor_per_scale, 5 + self.class_num))
            batch_label_mbbox = np.zeros((self.batch_size, self.train_output_size[1], self.train_output_size[1],
                                          self.anchor_per_scale, 5 + self.class_num))
            batch_label_lbbox = np.zeros((self.batch_size, self.train_output_size[2], self.train_output_size[2],
                                          self.anchor_per_scale, 5 + self.class_num))

            # 构建每个尺度上最多的 bounding boxes 的图
            batch_sbboxes = np.zeros((self.batch_size, self.max_bbox_per_scale, 4))
            batch_mbboxes = np.zeros((self.batch_size, self.max_bbox_per_scale, 4))
            batch_lbboxes = np.zeros((self.batch_size, self.max_bbox_per_scale, 4))

            num = 0
            # 是否还在当前的 epoch
            if self.batch_count < self.batch_num:
                # 这个 while 用于一个 epoch 中的数据一条一条凑够一个 batch_size
                while num < self.batch_size:
                    index = self.batch_count * self.batch_size + num
                    # 如果最后一个 batch 不够数据，则 从头拿数据来凑
                    if index >= self.sample_num:
                        index -= self.sample_num
                    annotation = self.annotations[index]
                    image, bboxes = self.parse_annotation(annotation)
                    label_sbbox, label_mbbox, label_lbbox, sbboxes, mbboxes, lbboxes = self.preprocess_true_boxes(
                        bboxes)

                    batch_image[num, :, :, :] = image

                    # [batch_size, x_scope, y_scope, iou_flag, 5 + classes]
                    batch_label_sbbox[num, :, :, :, :] = label_sbbox
                    batch_label_mbbox[num, :, :, :, :] = label_mbbox
                    batch_label_lbbox[num, :, :, :, :] = label_lbbox

                    batch_sbboxes[num, :, :] = sbboxes
                    batch_mbboxes[num, :, :] = mbboxes
                    batch_lbboxes[num, :, :] = lbboxes

                    num += 1

                self.batch_count += 1

                return batch_image, batch_label_sbbox, batch_label_mbbox, batch_label_lbbox, \
                       batch_sbboxes, batch_mbboxes, batch_lbboxes
            # 下一个 epoch
            else:
                self.batch_count = 0
                np.random.shuffle(self.annotations)
                raise StopIteration
            pass
        pass

    # 可以让 len(Dataset()) 返回 self.batch_num 的值
    def __len__(self):
        return self.batch_num

    # 获取 annotations.txt 文件信息
    def read_annotations(self):
        with open(self.data_file_path) as file:
            file_info = file.readlines()
            annotation = [line.strip() for line in file_info if len(line.strip().split()[1:]) != 0]
            np.random.shuffle(annotation)
            return annotation
        pass

    # 根据 annotation 信息 获取 image 和 bounding boxes
    def parse_annotation(self, annotation):
        # 将 "./data/images\Anime_180.jpg 388,532,588,729,0 917,154,1276,533,0"
        # 根据空格键切成 ['./data/images\\Anime_180.jpg', '388,532,588,729,0', '917,154,1276,533,0']
        line = annotation.split()
        image_path = line[0]
        if not os.path.exists(image_path):
            raise KeyError("%s does not exist ... " % image_path)
        image = np.array(cv2.imread(image_path))
        # 将 bboxes 做成 [[388, 532, 588, 729, 0], [917, 154, 1276, 533, 0]]
        bboxes = np.array([list(map(int, box.split(','))) for box in line[1:]])

        # 训练数据，进行仿射变换，让训练模型更好
        if self.train_flag:
            image, bboxes = self.random_horizontal_flip(np.copy(image), np.copy(bboxes))
            image, bboxes = self.random_crop(np.copy(image), np.copy(bboxes))
            image, bboxes = self.random_translate(np.copy(image), np.copy(bboxes))

        image, bboxes = utils.image_preporcess(np.copy(image), [self.train_input_size, self.train_input_size],
                                               np.copy(bboxes))
        return image, bboxes

    # 随机水平翻转
    def random_horizontal_flip(self, image, bboxes):

        if np.random.random() < 0.5:
            _, w, _ = image.shape
            image = image[:, ::-1, :]
            bboxes[:, [0, 2]] = w - bboxes[:, [2, 0]]

        return image, bboxes

    # 随机裁剪
    def random_crop(self, image, bboxes):

        if np.random.random() < 0.5:
            h, w, _ = image.shape
            max_bbox = np.concatenate([np.min(bboxes[:, 0:2], axis=0), np.max(bboxes[:, 2:4], axis=0)], axis=-1)

            max_l_trans = max_bbox[0]
            max_u_trans = max_bbox[1]
            max_r_trans = w - max_bbox[2]
            max_d_trans = h - max_bbox[3]

            crop_xmin = max(0, int(max_bbox[0] - np.random.uniform(0, max_l_trans)))
            crop_ymin = max(0, int(max_bbox[1] - np.random.uniform(0, max_u_trans)))
            crop_xmax = max(w, int(max_bbox[2] + np.random.uniform(0, max_r_trans)))
            crop_ymax = max(h, int(max_bbox[3] + np.random.uniform(0, max_d_trans)))

            image = image[crop_ymin: crop_ymax, crop_xmin: crop_xmax]

            bboxes[:, [0, 2]] = bboxes[:, [0, 2]] - crop_xmin
            bboxes[:, [1, 3]] = bboxes[:, [1, 3]] - crop_ymin

        return image, bboxes

    # 随机平移: 水平和竖直 方向移动变化，被移走后的位置，数值为0，显示为黑色
    def random_translate(self, image, bboxes):

        if np.random.random() < 0.5:
            h, w, _ = image.shape
            max_bbox = np.concatenate([np.min(bboxes[:, 0:2], axis=0), np.max(bboxes[:, 2:4], axis=0)], axis=-1)

            # 左上角 x、y 的数值，距离上边和下边的距离长度
            max_l_trans = max_bbox[0]
            max_u_trans = max_bbox[1]
            # 右下角 距离 右边和下边 的距离长度
            max_r_trans = w - max_bbox[2]
            max_d_trans = h - max_bbox[3]

            # 移动的偏移量，用来确保目标还在图像中
            tx = np.random.uniform(-(max_l_trans - 1), (max_r_trans - 1))
            ty = np.random.uniform(-(max_u_trans - 1), (max_d_trans - 1))

            # 仿射变换核函数
            M = np.array([[1, 0, tx], [0, 1, ty]])
            # 仿射变换操作
            image = cv2.warpAffine(image, M, (w, h))

            # 对 bboxes 进行相应值 处理
            bboxes[:, [0, 2]] = bboxes[:, [0, 2]] + tx
            bboxes[:, [1, 3]] = bboxes[:, [1, 3]] + ty

        return image, bboxes

    # 对 ground truth boxes 进行预处理
    def preprocess_true_boxes(self, bboxes):

        # 构建 [train_output_sizes, train_output_sizes, anchor_per_scale, 5 + num_classes] 结构 的 label 图, 全部填 0 值
        label = [np.zeros((self.train_output_size[i], self.train_output_size[i], self.anchor_per_scale,
                           5 + self.class_num)) for i in range(3)]

        # 构建 xywh 的结构图 [max_bbox_per_scale, 4, 3]
        bboxes_xywh = [np.zeros((self.max_bbox_per_scale, 4)) for _ in range(3)]
        # bbox_count = [0, 0, 0]
        bbox_count = np.zeros((3,))

        # 将 bboxes ['388,532,588,729,0', '917,154,1276,533,0'] list 进行遍历
        for bbox in bboxes:
            # 获取单个 ground truth boxes 的坐标 [xmin, ymin, xmax, ymax]
            bbox_coor = bbox[:4]
            # 获取 ground truth 类别的下标
            bbox_class_ind = bbox[4]

            # 构建一个 c 类 大小的 one_hot list 并用 0 填充
            one_hot = np.zeros(self.class_num, dtype=np.float)
            # 构建真实的 label: 将上面获取到的 ground truth 类别的下标 定义 该类别的 one_hot 值为 1
            one_hot[bbox_class_ind] = 1.0
            # 构建 class_num 长度 的 list，并均匀分布，并填充 1.0 / class_num 值，
            # 让平滑看起来更舒服点，使用倒数值，是为了下面做平滑的时候，方便将总概率凑够 100%
            uniform_distribution = np.full(self.class_num, 1.0 / self.class_num)
            deta = 0.01
            # 对 one_hot 进行平滑处理, 模拟真实预测情况，前景概率是 90+%，但不是 100%; 而背景的概率，也不是 0%
            # 不过，这个平滑也可以不做的，没什么必要，因为 调用 np.argmax() 获取最大概率下标的结果是一样的。
            smooth_one_hot = one_hot * (1 - deta) + deta * uniform_distribution

            # 转换 [xmin, ymin, xmax, ymax] --> [x, y, w, h] bounding boxes 结构
            bbox_xywh = utils.bbox_dxdy_xywh(bbox_coor)

            # 归一化处理，将 ground truth boxes 缩放到 strides=[8, 16, 32] 对应的尺度
            bbox_xywh_scaled = 1.0 * bbox_xywh[np.newaxis, :] / self.strides[:, np.newaxis]

            iou = []
            exist_positive = False
            # 这里的 3 表示 yolo v3 中有 3 个预测尺度
            for i in range(3):
                # 构建 anchors 结构 [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
                anchors_xywh = np.zeros((self.anchor_per_scale, 4))
                # 将 ground truth box 的中心点 设置进去, 后面 + 0.5 是给一个偏置值
                # [[x + 0.5, y + 0.5, 0, 0], [x + 0.5, y + 0.5, 0, 0], [x + 0.5, y + 0.5, 0, 0]]
                anchors_xywh[:, 0:2] = np.floor(bbox_xywh_scaled[i, 0:2]).astype(np.int32) + 0.5
                # 将 anchors box 的 w, h 设置进去, 例如下面的小尺度 anchor 值
                # [[x + 0.5, y + 0.5, 10, 13], [x + 0.5, y + 0.5, 16, 30], [x + 0.5, y + 0.5, 33, 23]]
                anchors_xywh[:, 2:4] = self.anchor_list[i]

                # 计算 ground truth box 与 anchor boxes 的 IOU
                # [x, y, w, h] --> [xmin, ymin, xmax, ymax]
                ground_truth_scaled = utils.bbox_xywh_dxdy(bbox_xywh_scaled[i][np.newaxis, :])
                anchor_boxes = utils.bbox_xywh_dxdy(anchors_xywh)

                # 缩放 再偏移 中心点，之后再计算 IOU，这样，用来比较判断 是否是正样本
                # anchor_boxes 里面有 3 个不同尺度 box，所以结果为 3 个 iou 值的 list
                iou_scale = utils.bboxes_iou(ground_truth_scaled, anchor_boxes)
                iou.append(iou_scale)
                # 这里 iou_mask 是 3 个 bool 元素的 list
                iou_mask = iou_scale > 0.3

                # np.any 为 逻辑 or 的意思，只要有一个是 True，这为 True
                if np.any(iou_mask):
                    # 获取 中心点 x、y
                    xind, yind = np.floor(bbox_xywh_scaled[i, 0:2]).astype(np.int32)

                    label[i][yind, xind, iou_mask, :] = 0
                    # 在 output 大小的 feature maps 中，找到映射 缩放后的中心点 对应的格子，
                    # 赋值 bbox_xywh、conf、prob
                    label[i][yind, xind, iou_mask, 0:4] = bbox_xywh
                    # 进入这个 if，则证明 IOU > 0.3, 有交集，是 前景，所以置信度为 1.0
                    label[i][yind, xind, iou_mask, 4:5] = 1.0
                    label[i][yind, xind, iou_mask, 5:] = smooth_one_hot

                    # 获取 bbox 对应的 下标; bbox_count = [0, 0, 0]
                    bbox_ind = int(bbox_count[i] % self.max_bbox_per_scale)
                    # bbox_ind 表示 一张图 第几个 ground truth box
                    bboxes_xywh[i][bbox_ind, :4] = bbox_xywh

                    bbox_count[i] += 1

                    exist_positive = True

            if not exist_positive:
                # 获取 IOU 值最大 所对应的下标
                best_anchor_ind = np.argmax(np.array(iou).reshape(-1), axis=-1)
                best_detect = int(best_anchor_ind / self.anchor_per_scale)
                # 获取最佳 anchor
                best_anchor = int(best_anchor_ind % self.anchor_per_scale)
                # 获取 最佳 anchor 对应的 中心点
                xind, yind = np.floor(bbox_xywh_scaled[best_detect, 0:2]).astype(np.int32)

                label[best_detect][yind, xind, best_anchor, :] = 0
                label[best_detect][yind, xind, best_anchor, 0:4] = bbox_xywh
                label[best_detect][yind, xind, best_anchor, 4:5] = 1.0
                label[best_detect][yind, xind, best_anchor, 5:] = smooth_one_hot

                bbox_ind = int(bbox_count[best_detect] % self.max_bbox_per_scale)
                # bbox_ind 表示 一张图 第几个 ground truth box
                bboxes_xywh[best_detect][bbox_ind, :4] = bbox_xywh

                bbox_count[best_detect] += 1

        label_sbbox, label_mbbox, label_lbbox = label
        sbboxes, mbboxes, lbboxes = bboxes_xywh
        return label_sbbox, label_mbbox, label_lbbox, sbboxes, mbboxes, lbboxes

4. 模型训练 yolo_train.py

上面的工作，都准备好了，那，飞扬吧。run 起来观察情况，不管是调参，还是选择模型，都是跑起来之后的事。

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/03/12 14:40
# @Author   : WanDaoYi
# @FileName : yolo_train.py
# ============================================

import os
from datetime import datetime
import shutil
import numpy as np
import tensorflow as tf
from tqdm import tqdm
from core.dataset import Dataset
from core.yolov3 import YoloV3
from config import cfg


class YoloTrain(object):

    def __init__(self):
        # 学习率的范围
        self.learning_rate_init = cfg.TRAIN.LEARNING_RATE_INIT
        self.learning_rate_end = cfg.TRAIN.LEARNING_RATE_END
        # 两个阶段的 epochs
        self.first_stage_epochs = cfg.TRAIN.FIRST_STAGE_EPOCHS
        self.second_stage_epochs = cfg.TRAIN.SECOND_STAGE_EPOCHS
        # 预热 epoch
        self.warm_up_epochs = cfg.TRAIN.WARM_UP_EPOCHS
        # 模型初始化
        self.initial_weight = cfg.TRAIN.INITIAL_WEIGHT
        # 衰变率 移动平均值
        self.moving_ave_decay = cfg.COMMON.MOVING_AVE_DECAY
        # 每个尺度最多的 boxes 数
        self.max_bbox_per_scale = cfg.COMMON.MAX_BBOX_PER_SCALE
        # 训练日志路径
        self.train_log = cfg.TRAIN.TRAIN_LOG
        # 验证日志路径
        self.val_log = cfg.TRAIN.VAL_LOG

        # 获取训练数据集
        self.train_data = Dataset()
        # 获取一个 epoch 需要训练多少次
        self.batch_num = len(self.train_data)
        # 获取验证数据集
        self.val_data = Dataset(train_flag=False)

        self.conv_bbox = ['conv_sbbox', 'conv_mbbox', 'conv_lbbox']

        self.train_loss_info = "train loss: %.2f"
        self.ckpt_info = "./checkpoint/val_loss=%.4f.ckpt"
        self.loss_info = "=> Epoch: %2d, Time: %s, Train loss: %.2f, Val loss: %.2f, Saving %s ..."

        # 加载 Session
        self.config = tf.ConfigProto()
        self.config.gpu_options.allow_growth = True
        self.sess = tf.Session(config=self.config)

        # 定义 feed_dict 图
        with tf.name_scope('define_input'):
            self.input_data = tf.placeholder(dtype=tf.float32, name='input_data')
            self.label_sbbox = tf.placeholder(dtype=tf.float32, name='label_sbbox')
            self.label_mbbox = tf.placeholder(dtype=tf.float32, name='label_mbbox')
            self.label_lbbox = tf.placeholder(dtype=tf.float32, name='label_lbbox')
            self.true_sbboxes = tf.placeholder(dtype=tf.float32, name='sbboxes')
            self.true_mbboxes = tf.placeholder(dtype=tf.float32, name='mbboxes')
            self.true_lbboxes = tf.placeholder(dtype=tf.float32, name='lbboxes')
            self.training_flag = tf.placeholder(dtype=tf.bool, name='training')

        # 定义 loss
        with tf.name_scope("define_loss"):
            self.model = YoloV3(self.input_data, self.training_flag)
            self.net_var = tf.global_variables()
            self.giou_loss, self.conf_loss, self.prob_loss = self.model.compute_loss(
                self.label_sbbox, self.label_mbbox, self.label_lbbox,
                self.true_sbboxes, self.true_mbboxes, self.true_lbboxes)
            self.loss = self.giou_loss + self.conf_loss + self.prob_loss
            pass

        # 定义学习率 和 衰减变化
        with tf.name_scope('learn_rate'):

            self.global_step = tf.Variable(1.0, dtype=tf.float64, trainable=False, name='global_step')
            # 预热训练的 batch 数
            warm_up_steps = tf.constant(self.warm_up_epochs * self.batch_num,
                                        dtype=tf.float64, name='warm_up_steps')

            # 总训练的 batch 次数
            train_steps = tf.constant((self.first_stage_epochs + self.second_stage_epochs) * self.batch_num,
                                      dtype=tf.float64, name='train_steps')

            # tf.cond() 类似于 if else 语句, if pred true_fn else false_fn
            # tf.cos() 余弦函数
            # 通过这个算法，用来在训练过程中逐渐缩小 learning_rate
            self.learn_rate = tf.cond(
                pred=self.global_step < warm_up_steps,
                true_fn=lambda: self.global_step / warm_up_steps * self.learning_rate_init,
                false_fn=lambda: self.learning_rate_end + 0.5 * (self.learning_rate_init - self.learning_rate_end) *
                                 (1 + tf.cos(
                                     (self.global_step - warm_up_steps) / (train_steps - warm_up_steps) * np.pi))
            )

            # 类似于 self.global_step += 1; 但是，使用这个方法的话，必须按照 tf 的规矩，
            # 先构建 变量图，再初始化，最后 run() 的时候，才会执行
            global_step_update = tf.assign_add(self.global_step, 1.0)

            pass

        # 衰变率 移动平均值
        with tf.name_scope("define_weight_decay"):
            moving_ave = tf.train.ExponentialMovingAverage(self.moving_ave_decay).apply(tf.trainable_variables())
            pass

        # 第一阶段训练
        with tf.name_scope("define_first_stage_train"):

            self.first_stage_trainable_var_list = []

            for var in tf.trainable_variables():
                var_name = var.op.name
                var_name_mess = str(var_name).split('/')
                if var_name_mess[0] in self.conv_bbox:
                    self.first_stage_trainable_var_list.append(var)

            first_stage_optimizer = tf.train.AdamOptimizer(self.learn_rate).minimize(self.loss,
                                                                                     var_list=self.first_stage_trainable_var_list)

            with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
                with tf.control_dependencies([first_stage_optimizer, global_step_update]):
                    with tf.control_dependencies([moving_ave]):
                        self.train_op_with_frozen_variables = tf.no_op()

        # 第二阶段训练
        with tf.name_scope("define_second_stage_train"):

            second_stage_trainable_var_list = tf.trainable_variables()

            second_stage_optimizer = tf.train.AdamOptimizer(self.learn_rate).minimize(self.loss,
                                                                                      var_list=second_stage_trainable_var_list)

            with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
                with tf.control_dependencies([second_stage_optimizer, global_step_update]):
                    with tf.control_dependencies([moving_ave]):
                        self.train_op_with_all_variables = tf.no_op()

        with tf.name_scope('loader_and_saver'):
            self.loader = tf.train.Saver(self.net_var)
            self.saver = tf.train.Saver(tf.global_variables(), max_to_keep=10)

        with tf.name_scope('summary'):
            tf.summary.scalar("learn_rate", self.learn_rate)
            tf.summary.scalar("giou_loss", self.giou_loss)
            tf.summary.scalar("conf_loss", self.conf_loss)
            tf.summary.scalar("prob_loss", self.prob_loss)
            tf.summary.scalar("total_loss", self.loss)

            if os.path.exists(self.train_log):
                shutil.rmtree(self.train_log)

            os.mkdir(self.train_log)

            if os.path.exists(self.val_log):
                shutil.rmtree(self.val_log)
            os.mkdir(self.val_log)

            # 汇总日志
            self.write_op = tf.summary.merge_all()
            # 定义两个tf.summary.FileWriter文件记录器再不同的子目录，分别用来存储训练和测试的日志数据
            # 同时，将Session计算图sess.graph加入训练过程，这样再TensorBoard的GRAPHS窗口中就能展示
            self.train_writer = tf.summary.FileWriter(self.train_log, graph=self.sess.graph)
            # 验证集日志
            self.val_writer = tf.summary.FileWriter(self.val_log)
            pass

        pass

    def do_train(self):
        # 初始化参数
        self.sess.run(tf.global_variables_initializer())

        try:
            # 加载已有模型
            print('=> Restoring weights from: %s ... ' % self.initial_weight)
            self.loader.restore(self.sess, self.initial_weight)
        # 如果模型不存在，则初始化模型
        except:
            print('=> %s does not exist !!!' % self.initial_weight)
            print('=> Now it starts to train YOLOV3 from scratch ...')
            # 并重新定义 第一阶段训练 epoch 为 0
            self.first_stage_epochs = 0

        for epoch in range(1, 1 + self.first_stage_epochs + self.second_stage_epochs):
            if epoch <= self.first_stage_epochs:
                train_op = self.train_op_with_frozen_variables
            else:
                train_op = self.train_op_with_all_variables

            # 调取进度条
            pbar = tqdm(self.train_data)

            train_epoch_loss = []
            val_epoch_loss = []

            for train_data in pbar:
                _, train_summary, train_step_loss, global_step_val = self.sess.run(
                    [train_op, self.write_op, self.loss, self.global_step], feed_dict={
                        self.input_data: train_data[0],
                        self.label_sbbox: train_data[1],
                        self.label_mbbox: train_data[2],
                        self.label_lbbox: train_data[3],
                        self.true_sbboxes: train_data[4],
                        self.true_mbboxes: train_data[5],
                        self.true_lbboxes: train_data[6],
                        self.training_flag: True,
                    })

                train_epoch_loss.append(train_step_loss)
                self.train_writer.add_summary(train_summary, global_step_val)
                pbar.set_description(self.train_loss_info % train_step_loss)

            for test_data in self.val_data:
                val_summary, val_step_loss = self.sess.run([self.write_op, self.loss],
                                                           feed_dict={
                                                               self.input_data: test_data[0],
                                                               self.label_sbbox: test_data[1],
                                                               self.label_mbbox: test_data[2],
                                                               self.label_lbbox: test_data[3],
                                                               self.true_sbboxes: test_data[4],
                                                               self.true_mbboxes: test_data[5],
                                                               self.true_lbboxes: test_data[6],
                                                               self.training_flag: False,
                                                           })

                val_epoch_loss.append(val_step_loss)
                self.val_writer.add_summary(val_summary, epoch + 1)

            train_epoch_loss = np.mean(train_epoch_loss)
            val_epoch_loss = np.mean(val_epoch_loss)
            ckpt_file = self.ckpt_info % val_epoch_loss
            now_time = datetime.now()
            print(self.loss_info % (epoch, now_time, train_epoch_loss, val_epoch_loss, ckpt_file))
            self.saver.save(self.sess, ckpt_file, global_step=epoch)


if __name__ == "__main__":
    # 代码开始时间
    start_time = datetime.now()
    print("开始时间: {}".format(start_time))

    demo = YoloTrain()
    demo.do_train()

    # 代码结束时间
    end_time = datetime.now()
    print("结束时间: {}, 训练模型耗时: {}".format(end_time, end_time - start_time))
    pass

5. 模型冻结 model_freeze.py

当训练活动 .ckpt 模型之后，我们常常需要对模型进行冻结操作

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/03/16 15:22
# @Author   : WanDaoYi
# @FileName : model_freeze.py
# ============================================

from datetime import datetime
from config import cfg
import tensorflow as tf
from core.yolov3 import YoloV3


class ModelFreeze(object):

    def __init__(self):
        pass

    # 调用 yolo 的节点，对模型进行冻结
    def yolo_model(self):

        output_node_names = cfg.FREEZE.YOLO_OUTPUT_NODE_NAME

        ckpt_model_path = cfg.FREEZE.CKPT_MODEL_PATH
        pb_model_path = cfg.FREEZE.PB_MODEL_PATH

        # 获取节点名
        with tf.name_scope('input'):
            input_data = tf.placeholder(dtype=tf.float32, name='input_data')
        model = YoloV3(input_data, training_flag=False)

        self.freeze_model(ckpt_model_path, pb_model_path, output_node_names)
        pass

    # 模型冻结
    def freeze_model(self, ckpt_file, pb_file, output_node_names):

        sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))
        saver = tf.train.Saver()

        saver.restore(sess, ckpt_file)
        converted_graph_def = tf.graph_util.convert_variables_to_constants(sess,
                                                                           input_graph_def=sess.graph.as_graph_def(),
                                                                           output_node_names=output_node_names)

        with tf.gfile.GFile(pb_file, "wb") as f:
            f.write(converted_graph_def.SerializeToString())
        pass


if __name__ == '__main__':
    # 代码开始时间
    start_time = datetime.now()
    print("开始时间: {}".format(start_time))

    demo = ModelFreeze()
    demo.yolo_model()

    # 代码结束时间
    end_time = datetime.now()
    print("结束时间: {}, 训练模型耗时: {}".format(end_time, end_time - start_time))

6. 模型测试

图像测试 yolo_test.py

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/03/17 00:01
# @Author   : WanDaoYi
# @FileName : yolo_test.py
# ============================================

from datetime import datetime
import os
import cv2
import numpy as np
import tensorflow as tf
import shutil
from core import utils
from config import cfg


class YoloTest(object):

    def __init__(self):
        # pb 模型路径
        self.pb_model_path = cfg.TEST.TEST_PB_MODEL_PATH
        # yolov3 网络 返回 3 个尺度节点
        self.return_elements = cfg.TEST.RETURN_ELEMENTS
        # class_name 字典
        self.class_name_dir = utils.read_class_names(cfg.COMMON.CLASS_FILE_PATH)
        # c 类 个数
        self.class_name_len = len(self.class_name_dir)
        # 输入尺度
        self.input_size = cfg.TEST.INPUT_SIZE
        # 输出 图像 文件夹
        self.output_image_file = cfg.TEST.OUTPUT_IMAGE_FILE
        # 输出 预测框信息 文件夹
        self.output_box_info_file = cfg.TEST.OUTPUT_BOX_INFO_FILE
        # 是否保存预测打框图像，默认为 True 保存
        self.save_boxes_image_flag = cfg.TEST.SAVE_BOXES_IMAGE_FLAG

        self.graph = tf.Graph()
        # 加载模型
        self.return_tensors = utils.read_pb_return_tensors(self.graph,
                                                           self.pb_model_path,
                                                           self.return_elements)
        self.sess = tf.Session(graph=self.graph)
        pass

    def object_predict(self, data_line_list):

        for data_line in data_line_list:

            data_info_list = data_line.strip().split()
            image_path = data_info_list[0]
            image_path = image_path.replace("\\", "/")
            image_name = image_path.split("/")[-1]
            txt_name = image_name.split(".")[0] + ".txt"

            image_info = cv2.imread(image_path)

            pred_box = self.do_predict(image_info, image_name)
            print("predict result of {}".format(image_name))

            output_box_info_path = os.path.join(self.output_box_info_file, txt_name)

            # 保存预测图像信息
            with open(output_box_info_path, 'w') as f:
                for bbox in pred_box:
                    coor = np.array(bbox[:4], dtype=np.int32)
                    score = bbox[4]
                    class_ind = int(bbox[5])
                    class_name = self.class_name_dir[class_ind]
                    score = '%.4f' % score
                    x_min, y_min, x_max, y_max = list(map(str, coor))
                    bbox_mess = ' '.join([class_name, score, x_min, y_min, x_max, y_max]) + '\n'
                    f.write(bbox_mess)
                    print('\t' + str(bbox_mess).strip())
                    pass
                pass

            pass
        pass

    # 预测操作
    def do_predict(self, image_info, image_name):
        image_shape = image_info.shape[: 2]
        # image_2_rgb = cv2.cvtColor(image_info, cv2.COLOR_BGR2RGB)
        image_data = utils.image_preporcess(np.copy(image_info),
                                            [self.input_size, self.input_size])
        image_data = image_data[np.newaxis, ...]
        pred_sbbox, pred_mbbox, pred_lbbox = self.sess.run(
            [self.return_tensors[1], self.return_tensors[2], self.return_tensors[3]],
            feed_dict={self.return_tensors[0]: image_data})

        pred_bbox = np.concatenate([np.reshape(pred_sbbox, (-1, 5 + self.class_name_len)),
                                    np.reshape(pred_mbbox, (-1, 5 + self.class_name_len)),
                                    np.reshape(pred_lbbox, (-1, 5 + self.class_name_len))],
                                   axis=0)

        bboxes = utils.postprocess_boxes(pred_bbox, image_shape, self.input_size, 0.3)
        pred_box = utils.nms(bboxes, 0.45, method='nms')

        if self.save_boxes_image_flag:
            new_image = utils.draw_bbox(image_info, pred_box)
            new_image = cv2.cvtColor(new_image, cv2.COLOR_BGR2RGB)
            save_image_path = os.path.join(self.output_image_file, image_name)
            cv2.imwrite(save_image_path, new_image)
            pass

        # # 展示图像
        # new_image = utils.draw_bbox(image_2_rgb, pred_box)
        # cv2.imshow("predict_image", new_image)
        # new_image.show()
        # cv2.waitKey(0)

        return pred_box
        pass


if __name__ == '__main__':

    # 代码开始时间
    start_time = datetime.now()
    print("开始时间: {}".format(start_time))

    # image data 的 路径 list
    data_path_list = utils.read_data_path(cfg.TEST.TEST_DATA_PATH)

    demo = YoloTest()

    if os.path.exists(demo.output_image_file):
        shutil.rmtree(demo.output_image_file)

    if os.path.exists(demo.output_box_info_file):
        shutil.rmtree(demo.output_box_info_file)

    os.mkdir(demo.output_image_file)
    os.mkdir(demo.output_box_info_file)

    demo.object_predict(data_path_list)

    # 代码结束时间
    end_time = datetime.now()
    print("结束时间: {}, 训练模型耗时: {}".format(end_time, end_time - start_time))

视频测试 yolo_video.py

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
# ============================================
# @Time     : 2020/03/18 22:26
# @Author   : WanDaoYi
# @FileName : yolo_video.py
# ============================================

from datetime import datetime
import cv2
import numpy as np
import tensorflow as tf
from core import utils
from config import cfg


class YoloVideo(object):

    def __init__(self):
        # pb 模型路径
        self.pb_model_path = cfg.TEST.TEST_PB_MODEL_PATH
        # yolov3 网络 返回 3 个尺度节点
        self.return_elements = cfg.TEST.RETURN_ELEMENTS
        # class_name 字典
        self.class_name_dir = utils.read_class_names(cfg.COMMON.CLASS_FILE_PATH)
        # c 类 个数
        self.class_name_len = len(self.class_name_dir)
        # 输入尺度
        self.input_size = cfg.TEST.INPUT_SIZE
        # 视频文件路径
        self.video_path = cfg.TEST.VEDIO_PATH

        self.graph = tf.Graph()
        # 加载模型
        self.return_tensors = utils.read_pb_return_tensors(self.graph,
                                                           self.pb_model_path,
                                                           self.return_elements)
        self.sess = tf.Session(graph=self.graph)
        pass

    # 对视频流的处理
    def do_video(self):
        vid = cv2.VideoCapture(self.video_path)
        while True:
            # frame 是 RGB 颜色空间
            return_value, frame = vid.read()
            if return_value:
                # utils.image_preporcess 这个方法里面有 cv2.COLOR_BGR2RGB 方法
                # 如果自己写的模型，可以调一下，也许不需要这里
                frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
                pass
            else:
                raise ValueError("No image!")
                pass

            frame_size = frame.shape[:2]
            # 之前训练的时候，转了一次颜色空间
            image_data = utils.image_preporcess(np.copy(frame), [self.input_size, self.input_size])
            image_data = image_data[np.newaxis, ...]

            pred_start_time = datetime.now()

            pred_sbbox, pred_mbbox, pred_lbbox = self.sess.run(
                [self.return_tensors[1], self.return_tensors[2], self.return_tensors[3]],
                feed_dict={self.return_tensors[0]: image_data})

            pred_bbox = np.concatenate([np.reshape(pred_sbbox, (-1, 5 + self.class_name_len)),
                                        np.reshape(pred_mbbox, (-1, 5 + self.class_name_len)),
                                        np.reshape(pred_lbbox, (-1, 5 + self.class_name_len))],
                                       axis=0)

            bboxes = utils.postprocess_boxes(pred_bbox, frame_size, self.input_size, 0.3)
            bboxes = utils.nms(bboxes, 0.45, method='nms')
            image = utils.draw_bbox(frame, bboxes)

            pred_end_time = datetime.now()
            print("一帧耗时: {}".format(pred_end_time - pred_start_time))

            cv2.namedWindow("result", cv2.WINDOW_AUTOSIZE)
            result = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
            cv2.imshow("result", result)
            # 退出按键
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break
            pass
        pass


if __name__ == '__main__':

    # 代码开始时间
    start_time = datetime.now()
    print("开始时间: {}".format(start_time))

    demo = YoloVideo()
    demo.do_video()

    # 代码结束时间
    end_time = datetime.now()
    print("结束时间: {}, 训练模型耗时: {}".format(end_time, end_time - start_time))

7. 在这个项目里面，读者可以通过 loss 和日志的参考来选择模型，也可以自己做一个 mAP 来选择模型。我在这里并没有做 mAP 的代码，以后有机会，我会在后面的博客再做细说。