【深度学习_4.3】构建YOLO物体识别算法

训练集里面的训练图片被标记如下

如果YOLO算法需要识别80种物体，那么c可以是1-80之间的任意整数，也可以是80维的向量，识别出的物体为1，其他均为零。

YOLO算法模型

输入（m,608,608,3）

输出是识别出来的物体被边框（pc,bx,by,bw,bh,c）,加入c是一个80维的向量，则每个边框有80个代表值

示例中将使用5个achors box，因此模型为IMAGE(吗，608,608,3)->deep CNN->ENCODING(m,19,19,5,85)

ENCODING细节解释

解释：如果识别出的物体落入了一个网格中，那么这个网格将会负责识别该物体

本示例中使用的是5个achors box，因此19*19中的每个网格ENCODING5个boxes，为了方便起见，把(m,19,19,5,85)展开为（m,19,19,425）

于是，对每个achors box做如下计算

对于19*19个网格中的每个格子，找到每个网格的最大score

给每个最大score的achors box上色

另外一种可视化YOLO输出结果的方式是：把识别物体绑定的方框画出来

通过一次前向传播识别19*19*85个achors box，并用不同样色标记识别出来物体上的方框

但是这种方式输出的结果仍然太多，需要用non-max-suppression方法来过滤掉一部分输出结果：

1.减少输出识别物体的方框数量

2.对于一个识别物体多个方框覆盖的情况，会仅仅保留一个方框

用最大阈值过滤class score，处理掉score低于阈值的achors box

模型总共给出19*19*5*85个数据，每个box有85个描述，将矩阵（19,19,5,85）变换为以下形式更便于运算

box_confidences(19,19,5,1)包含了五个achors box里面是否有是别的物体

boxes(19,19,5,4)包含了achors box的位置信息（pc,bx,by,bw,bh）

box_class_probs(19,19,5,80)包含了80个种类在每个achors box里面的可能性

代码实现：

def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6):

"""Filters YOLO boxes by thresholding on object and class confidence.

Arguments:
box_confidence -- tensor of shape (19, 19, 5, 1)
boxes -- tensor of shape (19, 19, 5, 4)
box_class_probs -- tensor of shape (19, 19, 5, 80)
threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box

Returns:
scores -- tensor of shape (None,), containing the class probability score for selected boxes
boxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxes
classes -- tensor of shape (None,), containing the index of the class detected by the selected boxes

Note: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold.
For example, the actual output size of scores would be (10,) if there are 10 boxes.
"""

# Step 1: 计算 box scores
box_scores = box_confidence*box_class_probs

# Step 2: 通过最大的box scores找到对应最大的box_class
box_classes = K.argmax(box_scores,axis=-1)
box_class_scores = K.max(box_scores,axis=-1)

# Step 3: 设置过滤器
filtering_mask = box_class_scores >= threshold

# Step 4: 把过滤器应用到scores, boxes and classes
### START CODE HERE ### (≈ 3 lines)
scores = tf.boolean_mask(box_class_scores, filtering_mask)
boxes = tf.boolean_mask(boxes, filtering_mask)
classes = tf.boolean_mask(box_classes, filtering_mask)

经过上一步的过滤之后，仍然存在方框重叠的情况，通过实施non-max-supression方法消除重叠方框

代码实现

def iou(box1, box2):

Arguments:
box1 -- first box, list object with coordinates (x1, y1, x2, y2)
box2 -- second box, list object with coordinates (x1, y1, x2, y2)
"""

# Calculate the (y1, x1, y2, x2) coordinates of the intersection of box1 and box2. Calculate its Area.

xi1 = max(box1[0],box2[0])
yi1 = max(box1[1],box2[1])
xi2 = min(box1[2],box2[2])
yi2 = min(box1[3],box2[3])
inter_area = (xi2-xi1)*(yi2-yi1)

# Calculate the Union area by using Formula: Union(A,B) = A + B - Inter(A,B)
box1_area = (box1[2]-box1[0])*(box1[3]-box1[1])
box2_area = (box2[2]-box2[0])*(box2[3]-box2[1])
union_area = (box1_area+box2_area) - inter_area

# compute the IoU
iou = inter_area/union_area

以下实施NMS算法的关键步骤：

1.找到score最高的box

2.移除和最高score的box的overlap程度高于iou_thredhold的box

3.只剩下overlap程度最低的box

通过以上步骤，移除了和best box的overlap程度高的box，只保留了the best box

使用TensorFlow应用yolo_non_max_suppression

def yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5):
"""
Applies Non-max suppression (NMS) to set of boxes

Arguments:
scores -- tensor of shape (None,), output of yolo_filter_boxes()
boxes -- tensor of shape (None, 4), output of yolo_filter_boxes() that have been scaled to the image size (see later)
classes -- tensor of shape (None,), output of yolo_filter_boxes()
max_boxes -- integer, maximum number of predicted boxes you'd like
iou_threshold -- real value, "intersection over union" threshold used for NMS filtering

Returns:
scores -- tensor of shape (, None), predicted score for each box
boxes -- tensor of shape (4, None), predicted box coordinates
classes -- tensor of shape (, None), predicted class for each box

Note: The "None" dimension of the output tensors has obviously to be less than max_boxes. Note also that this
function will transpose the shapes of scores, boxes, classes. This is made for convenience.
"""

max_boxes_tensor = K.variable(max_boxes, dtype='int32') # tensor to be used in tf.image.non_max_suppression()
K.get_session().run(tf.variables_initializer([max_boxes_tensor])) # initialize variable max_boxes_tensor

# Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep
nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes, iou_threshold)

# Use K.gather() to select only nms_indices from scores, boxes and classes
scores = K.gather(scores, nms_indices)
boxes = K.gather(boxes, nms_indices)
classes = K.gather(classes, nms_indices)

使用score threshold和NMS算法处理YOLO encoding输出的boxes

def yolo_eval(yolo_outputs, image_shape = (720., 1280.), max_boxes=10, score_threshold=.6, iou_threshold=.5):
"""
Converts the output of YOLO encoding (a lot of boxes) to your predicted boxes along with their scores, box coordinates and classes.

Arguments:
yolo_outputs -- output of the encoding model (for image_shape of (608, 608, 3)), contains 4 tensors:
box_confidence: tensor of shape (None, 19, 19, 5, 1)
box_xy: tensor of shape (None, 19, 19, 5, 2)
box_wh: tensor of shape (None, 19, 19, 5, 2)
box_class_probs: tensor of shape (None, 19, 19, 5, 80)
image_shape -- tensor of shape (2,) containing the input shape, in this notebook we use (608., 608.) (has to be float32 dtype)
max_boxes -- integer, maximum number of predicted boxes you'd like
score_threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
iou_threshold -- real value, "intersection over union" threshold used for NMS filtering

Returns:
scores -- tensor of shape (None, ), predicted score for each box
boxes -- tensor of shape (None, 4), predicted box coordinates
classes -- tensor of shape (None,), predicted class for each box
"""

### START CODE HERE ###

# Retrieve outputs of the YOLO model (≈1 line)
box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs[:]

# Convert boxes to be ready for filtering functions
boxes = yolo_boxes_to_corners(box_xy, box_wh)

# Use one of the functions you've implemented to perform Score-filtering with a threshold of score_threshold (≈1 line)
scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, score_threshold)

# Scale boxes back to original image shape.
boxes = scale_boxes(boxes, image_shape)

# Use one of the functions you've implemented to perform Non-max suppression with a threshold of iou_threshold (≈1 line)
scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes, iou_threshold)

参考Andrew Ng深度学习课程。

【深度学习_4.3】构建YOLO物体识别算法

猜你喜欢