[Deep Learning_4.3] Building a YOLO Object Recognition Algorithm

The training images in the training set are labeled as follows


If the YOLO algorithm needs to recognize 80 kinds of objects, then c can be any integer between 1-80, or an 80-dimensional vector, the recognized object is 1, and the others are zero.

YOLO algorithm model

input(m,608,608,3)

The output is that the recognized object is framed (pc,bx,by,bw,bh,c), adding c is an 80-dimensional vector, then each frame has 80 representative values

The example will use 5 achors boxes, so the model is IMAGE(?, 608,608,3)->deep CNN->ENCODING(m,19,19,5,85)

ENCODING details explained


Explanation: If an identified object falls into a grid, then the grid will be responsible for identifying the object

In this example, 5 achors boxes are used, so each grid in 19*19 ENCODING 5 boxes, for convenience, expand (m,19,19,5,85) to (m,19,19,425)


So, do the following calculations for each achors box


For each of the 19*19 grids, find the maximum score for each grid

Color the achors box for each maximum score

Another way to visualize the output of YOLO is to draw the box that recognizes the binding of the object

Identify 19*19*85 achors boxes through one forward propagation, and use different color markers to identify the boxes on the object


But there are still too many output results in this way, and the non-max-suppression method needs to be used to filter out part of the output results:

1. Reduce the number of boxes for outputting recognized objects

2. For a recognition object covered by multiple boxes, only one box will be retained

Filter the class score with the maximum threshold, and deal with the achors box whose score is lower than the threshold

The model gives a total of 19*19*5*85 data, and each box has 85 descriptions. It is more convenient to transform the matrix (19,19,5,85) into the following form

box_confidences(19,19,5,1) contains whether there are other objects in the five achors boxes

Boxes(19,19,5,4) contain the location information of the achors box (pc,bx,by,bw,bh)

box_class_probs(19,19,5,80) contains the possibility of 80 classes in each achors box


Code:

def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6):

    """Filters YOLO boxes by thresholding on object and class confidence.
    
    Arguments:
    box_confidence -- tensor of shape (19, 19, 5, 1)
    boxes -- tensor of shape (19, 19, 5, 4)
    box_class_probs -- tensor of shape (19, 19, 5, 80)
    threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
    
    Returns:
    scores -- tensor of shape (None,), containing the class probability score for selected boxes
    boxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxes
    classes -- tensor of shape (None,), containing the index of the class detected by the selected boxes
    
    Note: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold. 
    For example, the actual output size of scores would be (10,) if there are 10 boxes.
    """
    
    # Step 1: 计算 box scores
    box_scores = box_confidence*box_class_probs

     # Step 2: Find the corresponding largest box_class by the largest box scores
    box_classes = K.argmax(box_scores,axis=-1)
    box_class_scores = K.max(box_scores,axis=-1)

    # Step 3: Set the filter
    filtering_mask = box_class_scores >= threshold

   # Step 4: 把过滤器应用到scores, boxes and classes
    ### START CODE HERE ### (≈ 3 lines)
    scores = tf.boolean_mask(box_class_scores, filtering_mask)
    boxes = tf.boolean_mask(boxes, filtering_mask)
    classes = tf.boolean_mask(box_classes, filtering_mask)

    After the previous step of filtering, there are still boxes overlapping, and the overlapping boxes are eliminated by implementing the non-max-suppression method.


Code

def iou (box1, box2):

Arguments:
    box1 -- first box, list object with coordinates (x1, y1, x2, y2)
    box2 -- second box, list object with coordinates (x1, y1, x2, y2)
    """


    # Calculate the (y1, x1, y2, x2) coordinates of the intersection of box1 and box2. Calculate its Area.

    xi1 = max(box1[0],box2[0])
    yi1 = max(box1[1],box2[1])
    xi2 = min(box1[2],box2[2])
    yi2 = min(box1[3],box2[3])
    inter_area = (xi2-xi1)*(yi2-yi1) 

    # Calculate the Union area by using Formula: Union(A,B) = A + B - Inter(A,B)
    box1_area = (box1[2]-box1[0])*(box1[3]-box1[1])
    box2_area = (box2[2]-box2[0])*(box2[3]-box2[1])
    union_area = (box1_area+box2_area) - inter_area
    
    # compute the IoU
    iou = inter_area / union_area

The following are the key steps to implement the NMS algorithm:

1. Find the box with the highest score

2. The overlap of the removed and highest score box is higher than the box of iou_thredhold

3. Only the box with the lowest overlap is left

Through the above steps, the box with a high degree of overlap with the best box is removed, and only the best box is retained


Apply yolo_non_max_suppression with TensorFlow

def yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5):
    """
    Applies Non-max suppression (NMS) to set of boxes
    
    Arguments:
    scores -- tensor of shape (None,), output of yolo_filter_boxes()
    boxes -- tensor of shape (None, 4), output of yolo_filter_boxes() that have been scaled to the image size (see later)
    classes -- tensor of shape (None,), output of yolo_filter_boxes()
    max_boxes -- integer, maximum number of predicted boxes you'd like
    iou_threshold -- real value, "intersection over union" threshold used for NMS filtering
    
    Returns:
    scores -- tensor of shape (, None), predicted score for each box
    boxes -- tensor of shape (4, None), predicted box coordinates
    classes -- tensor of shape (, None), predicted class for each box
    
    Note: The "None" dimension of the output tensors has obviously to be less than max_boxes. Note also that this
    function will transpose the shapes of scores, boxes, classes. This is made for convenience.
    """
    
    max_boxes_tensor = K.variable(max_boxes, dtype='int32')     # tensor to be used in tf.image.non_max_suppression()
    K.get_session().run(tf.variables_initializer([max_boxes_tensor])) # initialize variable max_boxes_tensor
    
    # Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep
    nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes, iou_threshold)
    
    # Use K.gather() to select only nms_indices from scores, boxes and classes
    scores = K.gather(scores, nms_indices)
    boxes = K.gather(boxes, nms_indices)
    classes = K.gather(classes, nms_indices)

Use the score threshold and NMS algorithm to process the boxes output by YOLO encoding

def yolo_eval(yolo_outputs, image_shape = (720., 1280.), max_boxes=10, score_threshold=.6, iou_threshold=.5):
    """
    Converts the output of YOLO encoding (a lot of boxes) to your predicted boxes along with their scores, box coordinates and classes.
    
    Arguments:
    yolo_outputs -- output of the encoding model (for image_shape of (608, 608, 3)), contains 4 tensors:
                    box_confidence: tensor of shape (None, 19, 19, 5, 1)
                    box_xy: tensor of shape (None, 19, 19, 5, 2)
                    box_wh: tensor of shape (None, 19, 19, 5, 2)
                    box_class_probs: tensor of shape (None, 19, 19, 5, 80)
    image_shape -- tensor of shape (2,) containing the input shape, in this notebook we use (608., 608.) (has to be float32 dtype)
    max_boxes -- integer, maximum number of predicted boxes you'd like
    score_threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
    iou_threshold -- real value, "intersection over union" threshold used for NMS filtering
    
    Returns:
    scores -- tensor of shape (None, ), predicted score for each box
    boxes -- tensor of shape (None, 4), predicted box coordinates
    classes -- tensor of shape (None,), predicted class for each box
    """
    
    ### START CODE HERE ### 
    
    # Retrieve outputs of the YOLO model (≈1 line)
    box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs[:]


    # Convert boxes to be ready for filtering functions 
    boxes = yolo_boxes_to_corners(box_xy, box_wh)


    # Use one of the functions you've implemented to perform Score-filtering with a threshold of score_threshold (≈1 line)
    scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, score_threshold)
    
    # Scale boxes back to original image shape.
    boxes = scale_boxes(boxes, image_shape)


    # Use one of the functions you've implemented to perform Non-max suppression with a threshold of iou_threshold (≈1 line)
    scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes, iou_threshold)


Refer to Andrew Ng's deep learning course.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325529373&siteId=291194637