吴恩达深度学习4-Week3课后作业-自主驾驶-汽车检测

一、Deeplearning-assignment

在本周的作业中，将通过使用大的YOLO模型来了解对象检测。

您将学习:

在一个汽车检测数据集上使用对象检测
处理边界框

问题陈述：你正在开一辆自驾车。作为这个项目的重要组成部分，您首先要建立一个汽车检测系统。为了收集数据，你已经把摄像头安装在汽车的引擎盖上（意思是前面），在驾驶时每隔几秒就会拍摄前方的道路。

您已经将所有这些图像收集到一个文件夹中，并在您能找到的每辆车周围绘制了边界框来标记它们。这是一个你的边界框的例子。

如果你需要YOLO识别的类别有80个, 您可以用一个1到80的整数来表示类别标签c，或者用一个80维的向量 (有80个数字)，其中一个是1，其余的是0。在视频讲座中使用的是后者。

YOLO算法

YOLO ("you only look once") 是一个非常流行的算法，因为它的精确率非常高，同时也能实时运行。这个算法对图像只需处理一次，它只需要一个前向传播就能进行预测。经过非极大值抑制后，它将边界框与识别的对象一起输出。

非极大值抑制指只需要输出概率最大的分类结果，抑制那些很接近但不是最大的其他预测结果。

有关模型的一些细节：

The input is a batch of images of shape (m, 608, 608, 3)
The output is a list of bounding boxes along with the recognized classes. 每个边界框用6个数字表示 (pc,bx,by,bh,bw,c)(pc,bx,by,bh,bw,c). If you expand cc into an 80-dimensional vector, each bounding box is then represented by 85 numbers.

我们会使用 5 个 anchor boxes. So you can think of the YOLO architecture as the following: IMAGE (m, 608, 608, 3) -> DEEP CNN -> ENCODING (m, 19, 19, 5, 85).

现在让我们仔细看看这个 ENCODING 代表什么。

如果一个对象的中心/中点落入一个网格单元中，那么该网格单元负责检测该对象。

由于我们使用了5个 anchor boxes，所以19x19单元中的每一个都编码了5个框的信息。为了简单起见，我们将shape(19, 19, 5, 85)的最后两个维度展平，所以 Deep CNN 的输出是 (19, 19, 425)。

现在，对于每个单元格中的每个box，我们进行下面的计算，提取每个box包含某个类别的概率。

下面是一种可视化YOLO图像预测内容的方法：

For each of the 19x19 grid cells, find the maximum of the probability scores (taking a max across both the 5 anchor boxes and across different classes).
Color that grid cell according to what object that grid cell considers the most likely.

按以上方法做了后可以产生下面的结果：

另一种可视化YOLO输出的方法是绘制它输出的边界框。这样做的结果如下所示：

上图中，我们仅仅只绘制了被模型赋予了高概率性的 boxes ，但 boxes 仍然有很多。如果你想对算法的输出进行过滤以减少被检测到的物体的数量，你需要使用“非最大值抑制”。具体来说，您将执行下面的这些步骤：

Get rid of（清除、排除） boxes with a low score (meaning, the box is not very confident about detecting a class)
当几个boxes相互重叠并且检测到的是相同的对象时，只选择一个boxes。

你可以用阈值作为你的第一个过滤器，用它去除那些类别得分低于你指定阈值的边界框（box）。

二、相关算法代码（成功运行）

import argparse
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
import scipy.io
import scipy.misc
import numpy as np
import pandas as pd
import PIL
import tensorflow as tf
from keras import backend as K
from keras.layers import Input, Lambda, Conv2D
from keras.models import load_model, Model
from week3.yolo_utils import *
from week3.yad2k.models.keras_yolo import *

import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold=.6):
    box_scores = box_confidence * box_class_probs

    box_classes = K.argmax(box_scores, axis=-1)
    box_class_scores = K.max(box_scores, axis=-1, keepdims=False)

    filtering_mask = box_class_scores >= threshold

    scores = tf.boolean_mask(box_class_scores, filtering_mask)
    boxes = tf.boolean_mask(boxes, filtering_mask)
    classes = tf.boolean_mask(box_classes, filtering_mask)

    return scores, boxes, classes


# with tf.Session() as test_a:
#     box_confidence = tf.random_normal([19, 19, 5, 1], mean=1, stddev=4, seed=1)
#     boxes = tf.random_normal([19, 19, 5, 4], mean=1, stddev=4, seed=1)
#     box_class_probs = tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed=1)
#     scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold=0.5)
#     print("scores[2] = " + str(scores[2].eval()))
#     print("boxes[2] = " + str(boxes[2].eval()))
#     print("classes[2] = " + str(classes[2].eval()))
#     print("scores.shape = " + str(scores.shape))
#     print("boxes.shape = " + str(boxes.shape))
#     print("classes.shape = " + str(classes.shape))


def iou(box1, box2):
    xi1 = max(box1[0], box2[0])
    yi1 = max(box1[1], box2[1])
    xi2 = min(box1[2], box2[2])
    yi2 = min(box1[3], box2[3])
    inter_area = (yi2 - yi1) * (xi2 - xi1)

    box1_area = (box1[3] - box1[1]) * (box1[2] - box1[0])
    box2_area = (box2[3] - box2[1]) * (box2[2] - box2[0])
    union_area = box1_area + box2_area - inter_area

    iou = inter_area / union_area

    return iou


def yolo_non_max_suppression(scores, boxes, classes, max_boxes=10, iou_threshold=0.5):
    max_boxes_tensor = K.variable(max_boxes, dtype='int32')
    K.get_session().run(tf.variables_initializer([max_boxes_tensor]))

    nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes, iou_threshold)

    scores = K.gather(scores, nms_indices)
    boxes = K.gather(boxes, nms_indices)
    classes = K.gather(classes, nms_indices)

    return scores, boxes, classes


# with tf.Session() as test_b:
#     scores = tf.random_normal([54, ], mean=1, stddev=4, seed=1)
#     boxes = tf.random_normal([54, 4], mean=1, stddev=4, seed=1)
#     classes = tf.random_normal([54, ], mean=1, stddev=4, seed=1)
#     scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes)
#     print("scores[2] = " + str(scores[2].eval()))
#     print("boxes[2] = " + str(boxes[2].eval()))
#     print("classes[2] = " + str(classes[2].eval()))
#     print("scores.shape = " + str(scores.eval().shape))
#     print("boxes.shape = " + str(boxes.eval().shape))
#     print("classes.shape = " + str(classes.eval().shape))


def yolo_eval(yolo_outputs, image_shape=(720., 1280.), max_boxes=10, score_threshold=.6, iou_threshold=.5):
    box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs

    boxes = yolo_boxes_to_corners(box_xy, box_wh)

    scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, score_threshold)

    boxes = scale_boxes(boxes, image_shape)

    scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes, iou_threshold)

    return scores, boxes, classes


# with tf.Session() as test_b:
#     yolo_outputs = (tf.random_normal([19, 19, 5, 1], mean=1, stddev=4, seed=1),
#                     tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed=1),
#                     tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed=1),
#                     tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed=1))
#     scores, boxes, classes = yolo_eval(yolo_outputs)
#     print("scores[2] = " + str(scores[2].eval()))
#     print("boxes[2] = " + str(boxes[2].eval()))
#     print("classes[2] = " + str(classes[2].eval()))
#     print("scores.shape = " + str(scores.eval().shape))
#     print("boxes.shape = " + str(boxes.eval().shape))
#     print("classes.shape = " + str(classes.eval().shape))


sess = K.get_session()
class_names = read_classes("e:/code/Python/DeepLearning/Convolution model/week3/model_data/coco_classes.txt")
anchors = read_anchors("e:/code/Python/DeepLearning/Convolution model/week3/model_data/yolo_anchors.txt")
image_shape = (720., 1280.)

yolo_model = load_model("e:/code/Python/DeepLearning/Convolution model/week3/model_data/yolo.h5")

yolo_outputs = yolo_head(yolo_model.output, anchors, len(class_names))

scores, boxes, classes = yolo_eval(yolo_outputs, image_shape)


def predict(sess, image_file):

    image, image_data = preprocess_image("e:/code/Python/DeepLearning/Convolution model/week3/images/" + image_file, model_image_size=(608, 608))

    out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes],
                                                  feed_dict={yolo_model.input: image_data, K.learning_phase(): 0})

    print('Found {} boxes for {}'.format(len(out_boxes), image_file))

    colors = generate_colors(class_names)

    draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)

    image.save(os.path.join("e:/code/Python/DeepLearning/Convolution model/week3/out", image_file), quality=90)

    output_image = scipy.misc.imread(os.path.join("e:/code/Python/DeepLearning/Convolution model/week3/out", image_file))
    imshow(output_image)

    return out_scores, out_boxes, out_classes


# out_scores, out_boxes, out_classes = predict(sess, "test.jpg")
for i in range(1, 121):
    file = "000" + str(i) + ".jpg"
    out_scores, out_boxes, out_classes = predict(sess, file)

三、总结

汽车检测模型-YOLO总结：

输入图像（608,608,3）
图像经过CNN，输出（19,19,5,85）
将最后两个维度数据拉平，形状变为（19,19,425）
- Each cell in a 19x19 grid over the input image gives 425 numbers
- 425 = 5 x 85 because each cell contains predictions for 5 boxes, corresponding to 5 anchor boxes, as seen in lecture
- 85 = 5 + 80 where 5 is because (pc,bx,by,bh,bw)(pc,bx,by,bh,bw) has 5 numbers, and and 80 is the number of classes we'd like to detect
用以下的过滤器过滤，得到最终的边界框：
- Score-thresholding: 丢弃掉类别检测分数低于阈值的边界框
- Non-max suppression: 计算交并比，避免选择重叠的边界框

从这次work中可以看出:

YOLO是一个快速准确的最先进的物体检测模型
YOLO通过运行一个卷积网络，将一张输入图片转换为19x19x5x85维的输出
The encoding can be seen as a grid where each of the 19x19 cells contains information about 5 boxes
你使用 non-max suppression 过滤所有的边界框
- Score thresholding on the probability of detecting a class to keep only accurate (high probability) boxes
- Intersection over Union (IoU) thresholding to eliminate overlapping boxes