吴恩达Coursera深度学习课程 deeplearning.ai (4-3) 目标检测--编程作业

自动驾驶-汽车检测

第三周的作业将使用YOLO模型识别和定位车辆,主要实现参考了两篇论文:

导包

import argparse
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
import scipy.io
import scipy.misc
import numpy as np
import pandas as pd
import PIL
import tensorflow as tf
from keras import backend as K
from keras.layers import Input, Lambda, Conv2D
from keras.models import load_model, Model
from yolo_utils import read_classes, read_anchors, generate_colors, preprocess_image, draw_boxes, scale_boxes
from yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners, preprocess_true_boxes, yolo_loss, yolo_body

%matplotlib inline

1. 问题描述

你正在研究自动驾驶汽车。作为关键的一部分,你想要建立一个汽车检测系统。为了手机数据,你在汽车前面装了一个摄像头,可以每隔几秒就采集前方道路上的照片。

现在你收集并标注了数据,利用方框以及坐标等将汽车标记起来,如下图所示:

image

如果你有80个类别需要YOLO识别,你可以用一个label c来表示,c的值是1-80,也可以用一个80维的向量来表示,每个维度的值0表示未识别到,1表示识别到。

在课程中我们使用了后者向量表示法。而在此次作业中根据具体场景哪种方便用哪种,两种都有使用。

2 YOLO

YOLO (“you only look once”) 是一个流行的算法,在实际运行中可以获得较高的准确率。算法只需要一次前向传播来做出预测。在非最大抑制之后,用方彪标识出识别的对象。

2.1 模型细节

  • 输入一组图片:(m, 608, 608, 3)
  • 输出四一组识别对象上的标识方框。每个方框标识6个数 (pc,bx,by,bh,bw,c)。这里c为1-80,如果你想要用向量表示,则输出的方框表示85个数。

我们将使用5种 anchor boxex, 所以YOLO结构可以认为是:IMAGE (m, 608, 608, 3) -> DEEP CNN -> ENCODING (m, 19, 19, 5, 85)

下图展示了结果编码表示的更多细节

image

如果对象的中心落入一个方格,则这个方格负责识别此对象。

扫描二维码关注公众号,回复: 2443980 查看本文章

由于有5种 anchor boxes, 每个19x19的单元格都包含了5个boxes的编码信息。Anchor boxes 只定义了宽和高。

简化一些,我们展开(19, 19, 5, 85)的最后两个维度,则输出为(19,19,425)

image

现在,对于每个单元格的每个anchor box, 计算一下按元素乘积然后得出该box包含特定类的可能性分数。

image

这里是一种 YOLO 模型预测结果的形象表示方式

  • 对每一个19*19的单元格,找出最大的可能性分数(对每个分类的每个ancher box都找出最大分数)
  • 根据最可能出现的类对图片单元格进行染色。

如下图所示

image

注意:图像染色和可视化并不是YOLO算法预测的核心,只是一个展示算法中间结果的友好方式。

另外一种展示YOLO输出的方式是用方框标记识别,不同的颜色表示不同的分类,不同的形状表示不同的ancher。

image

上图我们只标识出了得分相对较高的boxes, 其实还有很多boxes。过滤出高分box的方法是“非最大值抑制”

  • 选出低分boxes (对是否识别一个种类不是很自信)
  • 从相互重叠并且是识别的同一个对象的boxes中选择分数最高的一个。

2.2 利用种类分值门槛进行过滤

去掉分值低于门槛的box

模型给出了(19x19x5x85)个数(假设用80个数表示80个分类),很容易进行拆分转换:
- box_confidence: (19×19,5,1) 表示Pc, 每个anchor预测到有对象的分数
- boxes: (19×19,5,4) 表示方框(bx,by,bh,bw)
- box_class_probs: (19×19,5,80) 是哪个类 (c1,c2,…c80)

练习:实现 yolo_filter_boxes()

计算Pc与classes的对应乘积,得到分数

a = np.random.randn(19*19, 5, 1)
b = np.random.randn(19*19, 5, 80)
c = a * b # shape of c will be (19*19, 5, 80)
  1. 对每个box
    1. 找出最高分的分类(80选1)
    2. 得出相应的分数
  2. 创建一个门槛mask:比如 ([0.9, 0.3, 0.4, 0.5, 0.1] < 0.4) 返回 [False, True, False, False, True] 注意你想保留的boxes应该为true
  3. 利用 TensorFlow 将 mask 应用到 box_class_scores 上,过滤掉不需要的boxes。
# GRADED FUNCTION: yolo_filter_boxes

def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6):
    """Filters YOLO boxes by thresholding on object and class confidence.

    Arguments:
    box_confidence -- tensor of shape (19, 19, 5, 1)
    boxes -- tensor of shape (19, 19, 5, 4)
    box_class_probs -- tensor of shape (19, 19, 5, 80)
    threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box

    Returns:
    scores -- tensor of shape (None,), containing the class probability score for selected boxes
    boxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxes
    classes -- tensor of shape (None,), containing the index of the class detected by the selected boxes

    Note: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold. 
    For example, the actual output size of scores would be (10,) if there are 10 boxes.
    """

    # Step 1: Compute box scores
    ### START CODE HERE ### (≈ 1 line)
    box_scores = box_confidence * box_class_probs
    ### END CODE HERE ###

    # Step 2: Find the box_classes thanks to the max box_scores, keep track of the corresponding score
    ### START CODE HERE ### (≈ 2 lines)
    box_classes = K.argmax(box_scores, axis=-1)
    box_class_scores = K.max(box_scores, axis=-1, keepdims=False)
    ### END CODE HERE ###

    # Step 3: Create a filtering mask based on "box_class_scores" by using "threshold". The mask should have the
    # same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold)
    ### START CODE HERE ### (≈ 1 line)
    filtering_mask = box_class_scores >= threshold
    ### END CODE HERE ###

    # Step 4: Apply the mask to scores, boxes and classes
    ### START CODE HERE ### (≈ 3 lines)
    scores = tf.boolean_mask(box_class_scores, filtering_mask)
    boxes = tf.boolean_mask(boxes, filtering_mask)
    classes = tf.boolean_mask(box_classes, filtering_mask)
    ### END CODE HERE ###

    return scores, boxes, classes

#########################################################

with tf.Session() as test_a:
    box_confidence = tf.random_normal([19, 19, 5, 1], mean=1, stddev=4, seed = 1)
    boxes = tf.random_normal([19, 19, 5, 4], mean=1, stddev=4, seed = 1)
    box_class_probs = tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed = 1)
    scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = 0.5)
    print("scores[2] = " + str(scores[2].eval()))
    print("boxes[2] = " + str(boxes[2].eval()))
    print("classes[2] = " + str(classes[2].eval()))
    print("scores.shape = " + str(scores.shape))
    print("boxes.shape = " + str(boxes.shape))
    print("classes.shape = " + str(classes.shape))

# scores[2] = 10.7506
# boxes[2] = [ 8.42653275  3.27136683 -0.5313437  -4.94137383]
# classes[2] = 7
# scores.shape = (?,)
# boxes.shape = (?, 4)
# classes.shape = (?,)

2.3 非最大抑制

经过门槛过滤,你仍然有很多重叠的boxes, 第二个过滤器将从重叠的里面选出正确的box,这个方法叫做非最大抑制(NMS)

image

非最大抑制算法用到一个很重要的方法:交并比(Intersection over Union, IoU)

image

练习:实现iou()
  • 在这个练习中(仅在这里), 我们使用两角坐标(左上角/右下角)而不是中心和宽高来表示一个box
  • 计算box面积的方法 (y2 - y1)x(x2 - x1)
  • 你还需要找到相交部分的坐标(xi1, yi1, xi2, yi2)
    • xi1 = max(两个方框的x1)
    • yi1 = max(两个方框的y1)
    • xi2 = min(两个方框的x2)
    • yi2 = min(两个方框的y2)

在下面代码中,我们约定box的左上角(0,0), 右下角(1,1)

# GRADED FUNCTION: iou

def iou(box1, box2):
    """Implement the intersection over union (IoU) between box1 and box2

    Arguments:
    box1 -- first box, list object with coordinates (x1, y1, x2, y2)
    box2 -- second box, list object with coordinates (x1, y1, x2, y2)
    """

    # Calculate the (y1, x1, y2, x2) coordinates of the intersection of box1 and box2. Calculate its Area.
    ### START CODE HERE ### (≈ 5 lines)
    xi1 = max(box1[0], box2[0])
    yi1 = max(box1[1], box2[1])
    xi2 = min(box1[2], box2[2])
    yi2 = min(box1[3], box2[3])
    inter_area = (xi2 - xi1) * (yi2 - yi1)
    ### END CODE HERE ###    

    # Calculate the Union area by using Formula: Union(A,B) = A + B - Inter(A,B)
    ### START CODE HERE ### (≈ 3 lines)
    box1_area = (box1[2] - box1[0]) * (box1[3] - box1[1])
    box2_area = (box2[2] - box2[0]) * (box2[3] - box2[1])
    union_area = box1_area + box2_area - inter_area
    ### END CODE HERE ###

    # compute the IoU
    ### START CODE HERE ### (≈ 1 line)
    iou = inter_area / union_area
    ### END CODE HERE ###

    return iou

#########################################################

box1 = (2, 1, 4, 3)
box2 = (1, 2, 3, 4) 
print("iou = " + str(iou(box1, box2)))

# iou = 0.14285714285714285

现在你准备好实现非最大抑制了。关键步骤为:
1. 选出具有最高分数的box
2. 计算该box和其他box的iou, 删除重叠部分iou大于 iou_threshold 的 box
3. 循环1,2 直到没有满足条件的 boxes

这样将会删除所有有大量重叠覆盖的的 boxes,只留下最优的。

练习:使用 TensorFlow 实现 yolo_non_max_suppression()

TensorFlow有用的方法:

  • tf.image.non_max_suppression() # 不需要用你自己的 iou 方法了
  • K.gather()
# GRADED FUNCTION: yolo_non_max_suppression

def yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5):
    """
    Applies Non-max suppression (NMS) to set of boxes

    Arguments:
    scores -- tensor of shape (None,), output of yolo_filter_boxes()
    boxes -- tensor of shape (None, 4), output of yolo_filter_boxes() that have been scaled to the image size (see later)
    classes -- tensor of shape (None,), output of yolo_filter_boxes()
    max_boxes -- integer, maximum number of predicted boxes you'd like
    iou_threshold -- real value, "intersection over union" threshold used for NMS filtering

    Returns:
    scores -- tensor of shape (, None), predicted score for each box
    boxes -- tensor of shape (4, None), predicted box coordinates
    classes -- tensor of shape (, None), predicted class for each box

    Note: The "None" dimension of the output tensors has obviously to be less than max_boxes. Note also that this
    function will transpose the shapes of scores, boxes, classes. This is made for convenience.
    """

    max_boxes_tensor = K.variable(max_boxes, dtype='int32')     # tensor to be used in tf.image.non_max_suppression()
    K.get_session().run(tf.variables_initializer([max_boxes_tensor])) # initialize variable max_boxes_tensor

    # Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep
    ### START CODE HERE ### (≈ 1 line)
    nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes, iou_threshold, name=None)
    ### END CODE HERE ###

    # Use K.gather() to select only nms_indices from scores, boxes and classes
    ### START CODE HERE ### (≈ 3 lines)
    scores = K.gather(scores, nms_indices)
    boxes = K.gather(boxes, nms_indices)
    classes = K.gather(classes, nms_indices)
    ### END CODE HERE ###

    return scores, boxes, classes

##############################################

with tf.Session() as test_b:
    scores = tf.random_normal([54,], mean=1, stddev=4, seed = 1)
    boxes = tf.random_normal([54, 4], mean=1, stddev=4, seed = 1)
    classes = tf.random_normal([54,], mean=1, stddev=4, seed = 1)
    scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes)
    print("scores[2] = " + str(scores[2].eval()))
    print("boxes[2] = " + str(boxes[2].eval()))
    print("classes[2] = " + str(classes[2].eval()))
    print("scores.shape = " + str(scores.eval().shape))
    print("boxes.shape = " + str(boxes.eval().shape))
    print("classes.shape = " + str(classes.eval().shape))

# scores[2] = 6.9384
# boxes[2] = [-5.299932    3.13798141  4.45036697  0.95942086]
# classes[2] = -2.24527
# scores.shape = (10,)
# boxes.shape = (10, 4)
# classes.shape = (10,)

2.4 包装过滤器

是时候实现深度 CNN 了(19x19x5x85)

练习:实现 yolo_eval()

yolo_eval 方法将YOLO 的输出进行编码并用非最大抑制进行过滤。

表示 box 的方式由好多种,比如左上角/右下角的坐标,比如中心和宽高。YOLO 在运算过程中将灵活转换这些表示方式。

# (x,y,w,h) -->  (x1, y1, x2, y2)
# 用于符合yolo_filter_boxes的输入
boxes = yolo_boxes_to_corners(box_xy, box_wh) 
# 格局图片大小调整 box 大小
boxes = scale_boxes(boxes, image_shape)

代码

# GRADED FUNCTION: yolo_eval

def yolo_eval(yolo_outputs, image_shape = (720., 1280.), max_boxes=10, score_threshold=.6, iou_threshold=.5):
    """
    Converts the output of YOLO encoding (a lot of boxes) to your predicted boxes along with their scores, box coordinates and classes.

    Arguments:
    yolo_outputs -- output of the encoding model (for image_shape of (608, 608, 3)), contains 4 tensors:
                    box_confidence: tensor of shape (None, 19, 19, 5, 1)
                    box_xy: tensor of shape (None, 19, 19, 5, 2)
                    box_wh: tensor of shape (None, 19, 19, 5, 2)
                    box_class_probs: tensor of shape (None, 19, 19, 5, 80)
    image_shape -- tensor of shape (2,) containing the input shape, in this notebook we use (608., 608.) (has to be float32 dtype)
    max_boxes -- integer, maximum number of predicted boxes you'd like
    score_threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
    iou_threshold -- real value, "intersection over union" threshold used for NMS filtering

    Returns:
    scores -- tensor of shape (None, ), predicted score for each box
    boxes -- tensor of shape (None, 4), predicted box coordinates
    classes -- tensor of shape (None,), predicted class for each box
    """

    ### START CODE HERE ### 

    # Retrieve outputs of the YOLO model (≈1 line)
    box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs

    # Convert boxes to be ready for filtering functions 
    boxes = yolo_boxes_to_corners(box_xy, box_wh)

    # Use one of the functions you've implemented to perform Score-filtering with a threshold of score_threshold (≈1 line)
    scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, score_threshold)

    # Scale boxes back to original image shape.
    boxes = scale_boxes(boxes, image_shape)

    # Use one of the functions you've implemented to perform Non-max suppression with a threshold of iou_threshold (≈1 line)
    scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes, iou_threshold)

    ### END CODE HERE ###

    return scores, boxes, classes

###############################################

with tf.Session() as test_b:
    yolo_outputs = (tf.random_normal([19, 19, 5, 1], mean=1, stddev=4, seed = 1),
                    tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
                    tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
                    tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed = 1))
    scores, boxes, classes = yolo_eval(yolo_outputs)
    print("scores[2] = " + str(scores[2].eval()))
    print("boxes[2] = " + str(boxes[2].eval()))
    print("classes[2] = " + str(classes[2].eval()))
    print("scores.shape = " + str(scores.eval().shape))
    print("boxes.shape = " + str(boxes.eval().shape))
    print("classes.shape = " + str(classes.eval().shape))

# scores[2] = 138.791
# boxes[2] = [ 1292.32971191  -278.52166748  3876.98925781  -835.56494141]
# classes[2] = 54
# scores.shape = (10,)
# boxes.shape = (10, 4)
# classes.shape = (10,)

YOLO 的总结

  • 输入图片(608, 608, 3)
  • 输入的图片经过一个 CNN,得到一个输出(19,19,5,85)
  • 展开图片的后两个维度,得到 (19, 19, 425)
  • 19x19 中的每个单元格都包含了图片的425个数
  • 425 = 5 x 85 因为每个单元格包含5个预测 boxes, 对于5个 anchor boxes
  • 85 = 5 + 80 其中5表示(pc,bx,by,bh,bw),80代表要检测的分类数
  • 然后基于以下规则挑选一些 boxes
    • 分值门槛:扔掉预测值低于门槛的 boxes
    • 非最大抑制:计算 iou,避免重叠的同一个对象识别
  • 给出 YOLO 的最后输出

3 测试训练好了的 YOLO 模型

创建session

sess = K.get_session()

3.1 定义classes, anchers 和 图片大小

classes和anchers文件是分开的,另外原始文件是(720, 1280)的,我们可以处理成(608, 608)

class_names = read_classes("model_data/coco_classes.txt")
anchors = read_anchors("model_data/yolo_anchors.txt")
image_shape = (720., 1280.)   

3.2 导入预训练模型

模型来自the official YOLO website, 文件。yolo.h5

yolo_model = load_model("model_data/yolo.h5")
yolo_model.summary()

注意利用前文程序将图片(m, 608, 608, 3) 转换为 (m, 19, 19, 5, 85)

3.3 将模型输出转换为识别框tensor

yolo_outputs = yolo_head(yolo_model.output, anchors, len(class_names))

接下来将yolo_ouput 传给模型的 yolo_eval

3.4 过滤boxes

yolo_ouput 已经将输出的格式调整好了,调用前文程序 yolo_eval 选出最好的boxes

scores, boxes, classes = yolo_eval(yolo_outputs, image_shape)

3.5 在图片上运行模型

步骤:
1. 创建session
2. yolo_model.input 给到 yolo_model 计算输出 yolo_model.output
3. yolo_model.output 给到 yolo_head,转换为 yolo_output
4. yolo_output 经过过滤-yolo_eval,输出预测的接轨:scores, boxes, classes

练习:实现模型预测方法 yolo_predict

提示方法:

image, image_data = preprocess_image("images/" + image_file, model_image_size = (608, 608))

方法输出:

  • image: 用于在图片上画出 boxes 的 PIL 表示,这里你不需要用它
  • image_data: 一个 numpy-array 表示的图片,经作为 CNN 的输入

当模型使用 BatchNorm 时,feed_dict {K.learning_phase(): 0} 中需要多一个占位符 placeholder

def predict(sess, image_file):
    """
    Runs the graph stored in "sess" to predict boxes for "image_file". Prints and plots the preditions.

    Arguments:
    sess -- your tensorflow/Keras session containing the YOLO graph
    image_file -- name of an image stored in the "images" folder.

    Returns:
    out_scores -- tensor of shape (None, ), scores of the predicted boxes
    out_boxes -- tensor of shape (None, 4), coordinates of the predicted boxes
    out_classes -- tensor of shape (None, ), class index of the predicted boxes

    Note: "None" actually represents the number of predicted boxes, it varies between 0 and max_boxes. 
    """

    # Preprocess your image
    image, image_data = preprocess_image("images/" + image_file, model_image_size = (608, 608))

    # Run the session with the correct tensors and choose the correct placeholders in the feed_dict.
    # You'll need to use feed_dict={yolo_model.input: ... , K.learning_phase(): 0})
    ### START CODE HERE ### (≈ 1 line)
    out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes], feed_dict = {yolo_model.input:image_data, K.learning_phase(): 0})
    ### END CODE HERE ###

    # Print predictions info
    print('Found {} boxes for {}'.format(len(out_boxes), image_file))
    # Generate colors for drawing bounding boxes.
    colors = generate_colors(class_names)
    # Draw bounding boxes on the image file
    draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)
    # Save the predicted bounding box on the image
    image.save(os.path.join("out", image_file), quality=90)
    # Display the results in the notebook
    output_image = scipy.misc.imread(os.path.join("out", image_file))
    imshow(output_image)

    return out_scores, out_boxes, out_classes

########################################################
# 在 tset.jpg 上进行测试 
out_scores, out_boxes, out_classes = predict(sess, "test.jpg")

# Found 7 boxes for test.jpg
# car 0.60 (925, 285) (1045, 374)
# car 0.66 (706, 279) (786, 350)
# bus 0.67 (5, 266) (220, 407)
# car 0.70 (947, 324) (1280, 705)
# car 0.74 (159, 303) (346, 440)
# car 0.80 (761, 282) (942, 412)
# car 0.89 (367, 300) (745, 648)

刚才运行的模型可以识别 coco_classes.txt 列出的 80 个种类,你可以自己试一下。

谨记

  • YOLO 是一个高水平的检测模型,迅速又准确
  • 输入图片通过 CNN 输出 19x19x5x85 的维度
  • 可以认为 19x19 中的每个单元格都包含 5 个 boxes 的信息
  • 过滤器使用非最大抑制进行过滤
    • 门槛过滤器过滤掉低分的识别,只留下高分的识别
    • 利用IOU门槛识别消除重叠的boxes

从头开始随机化和训练参数需要大量的数据集合大量的计算,这里我们使用了预训练模型,你也可以尝试用你自己的数据集训练,不过这挺不容易的。

相关文献

文章讨论的YOLO思想主要来自以下两篇论文
模型实现参考了Allan Zelener 的github
预训练模型的参数和权重来自YOLO官方网站

样例数据是 driver.ai 提供的,版权归其所有,再此表示感谢。

猜你喜欢

转载自blog.csdn.net/haoyutiangang/article/details/81074799