Tensorflow version Faster RCNN source parsing (TFFRCNN) (11) gt_data_layer / minibatch.py

This blog is on github CharlesShang / TFFRCNN version of the source code for parsing Series Notes

--------------- personal study notes ---------------

---------------- The author Wu Jiang --------------

------ Click here to link to the original blog Park ------

 

And roi_data_layer / similar minibatch.py, the snippet may not perform some functions

"""Compute minibatch blobs for training a Fast R-CNN network."""

1.get_minibatch(roidb, num_classes)

Update roidb [i] 'info_boxes' field ( unknown content, 18 is what is meant ), increasing the 'data' (image data blob) and 'parameters' field (related parameters, including num_scale image scaling quantity scale, num_aspect having an aspect ratio of quantity, cfg.TRAIN.SCALES, cfg.TRAIN.SCALE_MAPPING, cfg.TRAIN.ASPECT_HEIGHTS, cfg.TRAIN.ASPECT_WIDTHS , behind three values should be no error, it is also possible that the function is not implemented ), is _get_next_minibatch (... ) call (gt_data_layer / layer.py in)

# 更新roidb[i]'info_boxes'字段、增加'data'和'parameters'字段
def get_minibatch(roidb, num_classes):
    """Given a roidb, construct a minibatch sampled from it."""
    num_images = len(roidb)
    # 默认TRAIN.BATCH_SIZE = 128
    assert(cfg.TRAIN.BATCH_SIZE % num_images == 0), \
        'num_images ({}) must divide BATCH_SIZE ({})'. \
        format(num_images, cfg.TRAIN.BATCH_SIZE)
    # Get the input image blob, formatted for caffe
    im_blob = _get_image_blob(roidb)

    # build the box information blob
    #18 dead write here, refers to what? ? ? 
    = np.zeros info_boxes_blob ((0, 18 is), DTYPE = np.float32)
     # Default = TRAIN.SCALES (600,) 
    num_scale = len (cfg.TRAIN.SCALES)
     for I in xrange (NUM_IMAGES): 
        info_boxes = roidb [ i] [ ' info_boxes ' ]
         # Change at The BATCH index 
        # why this treatment? ? ? What is the first column of each represent 3,8? ? ? 
        info_boxes [:, 2] = + I * num_scale 
        info_boxes [:, . 7] + = I * num_scale 
        info_boxes_blob = np.vstack ((info_boxes_blob, info_boxes)) 

    # Build The BLOB Parameter
    # Default TRAIN.ASPECTS = (1,) only one? ? ? (Aspect ratio to use During Training) 
    num_aspect = len (cfg.TRAIN.ASPECTS) 
    NUM = 2 + 2 + 2 * * num_scale num_aspect    # 6? 
    # Parameters_blob stores the following parameters 
    # num_scale scale number image scaling len (cfg.TRAIN.SCALES). 1 = 
    # num_aspect aspect ratio using the number len (cfg.TRAIN.ASPECTS). 1 = 
    # cfg.TRAIN.SCALES (600,) 
    # CFG. TRAIN.SCALE_MAPPING not ignore this value by triggering error? ? ? Perhaps this function is not called 
    # cfg.TRAIN.ASPECT_HEIGHTS not ignore this value by triggering error? ? ? 
    # Cfg.TRAIN.ASPECT_WIDTHS not ignore this value by triggering error? ? ? 
    = np.zeros parameters_blob ((NUM), DTYPE = np.float32) 
    parameters_blob [0]= num_scale
    parameters_blob[1] = num_aspect
    parameters_blob[2:2+num_scale] = cfg.TRAIN.SCALES
    parameters_blob[2+num_scale:2+2*num_scale] = cfg.TRAIN.SCALE_MAPPING
    parameters_blob[2+2*num_scale:2+2*num_scale+num_aspect] = cfg.TRAIN.ASPECT_HEIGHTS
    parameters_blob[2+2*num_scale+num_aspect:2+2*num_scale+2*num_aspect] = cfg.TRAIN.ASPECT_WIDTHS
    # For debug visualizations
    # _vis_minibatch(im_blob, rois_blob, labels_blob, sublabels_blob)
    blobs = {'data': im_blob,
             'info_boxes': info_boxes_blob,
             'parameters': parameters_blob}
    return blobs

2._get_image_blob(roidb)

Incoming roidb Save the image mean, scaling processing to obtain the processed image is stored into processes_ims list, which is passed as a parameter to im_list_to_blob (...) function returns the BLOB image data, is get_minibatch (...) function call, the configuration blobs 'data' field

And roi_data_layer / minibatch.py (target_size only a single scale to zoom) difference between this function is that scaling using multi-scale TRAIN.SCALES_BASE = (0.25, 0.5, 1.0 , 2.0, 3.0), why you want to use multiple scales? No call

DEF _get_image_blob (roidb):
     "" " . Builds AN BLOB from the INPUT at The ImagesRF Royalty Free AT in at The roidb at The Different Scales" "" 
    NUM_IMAGES = len (roidb)
     # list of stored image scaling constituted as a parameter im_list_to_blob (. ..) function to obtain image data BLOB 
    processed_ims = []
     for I in xrange (NUM_IMAGES):
         # Read image 
        IM = cv2.imread (roidb [I] [ ' image ' ])
         IF roidb [I] [ ' Flipped ' ]: 
            IM = IM [:, :: -. 1 ,:] 
        im_orig = im.astype (np.float32,copy=True) 
        im_orig - = cfg.PIXEL_MEANS
         # Build Image Pyramid 
        # and roi_data_layer / minibatch.py in _get_image_blob (...) the difference in this! ! ! 
        # Default = TRAIN.SCALES_BASE (0.25, 0.5, 1.0, 2.0, 3.0) 
        # Why here using multi-scale? ? ? (Scales to Compute Real Features) 
        for im_scale in cfg.TRAIN.SCALES_BASE: 
            IM = cv2.resize (im_orig, None, None, = im_scale fx, fy = im_scale, 
                        interpolation = cv2.INTER_LINEAR) 
            processed_ims.append (IM) 
    # the Create a blob to hold the input images, blob.py the 
    blob = im_list_to_blob(processed_ims)
    return blob 

3._project_image_blob(im_rois, im_scale_factor)

 For rois zoom, no calls

def _project_im_rois(im_rois, im_scale_factor):
    """Project image RoIs into the rescaled training image."""
    rois = im_rois * im_scale_factor
    return rois

4._get_bbox_regression_labels(bbox_target_data, num_classes)

Expansion N * 5 bbox_targets to N * ( 4 * num_classes) There are only certain types of non-target return 0 (accepted by the network's shape), construct N * (4 * num_classes) of bbox_loss_weights, return bbox_targets and bbox_loss_weights, no calls

# 扩充N*5 bbox_targets to N*(4*num_classes)仅某类有非0的回归目标值
# 构造N*(4*num_classes)的bbox_loss_weights
def _get_bbox_regression_labels(bbox_target_data, num_classes):
    """
    Bounding-box regression targets are stored in a compact紧密的,紧凑的 form in the roidb.
    This function expands those targets into the 4-of-4*K representation used
    by the network (i.e. only one class has non-zero targets). The loss weights
    are similarly expanded.
    Returns:
        bbox_target_data (ndarray): N x 4K blob of regression targets
        bbox_loss_weights (ndarray): N x 4K blob of loss weights
    """
    clss = bbox_target_data[:, 0]
    bbox_targetsNp.zeros = ((clss.size, * num_classes. 4), DTYPE = np.float32) 
    bbox_loss_weights = np.zeros (bbox_targets.shape, DTYPE = np.float32) 
    inds. = Np.where (CLSS> 0) [0]   # removed BG 
    for IND in inds.: 
        CLS = CLSS [IND] 
        Start = *. 4 CLS 
        End = Start +. 4
         # expanded N * 5 bbox_targets to N * ( 4 * num_classes) has only a certain non-zero target value regression 
        bbox_targets [ IND, Start: End] = bbox_target_data [IND,. 1 :]
         # Shape of N * (4 * num_classes), corresponding to only a certain value of 1111, the rest 0
        bbox_loss_weights[ind, start:end] = [1., 1., 1., 1.]
    return bbox_targets, bbox_loss_weights

5._vis_minibatch(im_blob, rois_blob, labels_blob, sublabels_blob)

Roi draw rectangle, print-related information, no calls

# Draw roi rectangle, printed information 
DEF _vis_minibatch (im_blob, rois_blob, labels_blob, sublabels_blob):
     "" " . A Mini-BATCH the Visualize for the debugging " "" 
    Import matplotlib.pyplot AS PLT
     for I in xrange (rois_blob.shape [ 0]):
         # . 1 (roi source index) +4 (roi coordinates) 
        ROIs = rois_blob [I,:]
         # of the source image index roi 
        im_ind = ROIs [0] 
        roi = ROIs [2 :] 
        IM = im_blob [im_ind, :,:,:.] TRANSPOSE ((. 1, 2 ., 0)) Copy () 
        IM + = cfg.PIXEL_MEANS 
        IM= im[:, :, (2, 1, 0)]
        im = im.astype(np.uint8)
        cls = labels_blob[i]
        subcls = sublabels_blob[i]
        plt.imshow(im)
        print 'class: ', cls, ' subclass: ', subcls
        plt.gca().add_patch(
            plt.Rectangle((roi[0], roi[1]), roi[2] - roi[0],
                          roi[3] - roi[1], fill=False,
                          edgecolor='r', linewidth=3)
            )
        plt.show()

Guess you like

Origin www.cnblogs.com/deeplearning1314/p/11325018.html