This blog is on github CharlesShang / TFFRCNN version of the source code for parsing Series Notes
--------------- personal study notes ---------------
---------------- The author Wu Jiang --------------
------ Click here to link to the original blog Park ------
And roi_data_layer / similar minibatch.py, the snippet may not perform some functions
"""Compute minibatch blobs for training a Fast R-CNN network."""
1.get_minibatch(roidb, num_classes)
Update roidb [i] 'info_boxes' field ( unknown content, 18 is what is meant ), increasing the 'data' (image data blob) and 'parameters' field (related parameters, including num_scale image scaling quantity scale, num_aspect having an aspect ratio of quantity, cfg.TRAIN.SCALES, cfg.TRAIN.SCALE_MAPPING, cfg.TRAIN.ASPECT_HEIGHTS, cfg.TRAIN.ASPECT_WIDTHS , behind three values should be no error, it is also possible that the function is not implemented ), is _get_next_minibatch (... ) call (gt_data_layer / layer.py in)
# 更新roidb[i]'info_boxes'字段、增加'data'和'parameters'字段 def get_minibatch(roidb, num_classes): """Given a roidb, construct a minibatch sampled from it.""" num_images = len(roidb) # 默认TRAIN.BATCH_SIZE = 128 assert(cfg.TRAIN.BATCH_SIZE % num_images == 0), \ 'num_images ({}) must divide BATCH_SIZE ({})'. \ format(num_images, cfg.TRAIN.BATCH_SIZE) # Get the input image blob, formatted for caffe im_blob = _get_image_blob(roidb) # build the box information blob #18 dead write here, refers to what? ? ? = np.zeros info_boxes_blob ((0, 18 is), DTYPE = np.float32) # Default = TRAIN.SCALES (600,) num_scale = len (cfg.TRAIN.SCALES) for I in xrange (NUM_IMAGES): info_boxes = roidb [ i] [ ' info_boxes ' ] # Change at The BATCH index # why this treatment? ? ? What is the first column of each represent 3,8? ? ? info_boxes [:, 2] = + I * num_scale info_boxes [:, . 7] + = I * num_scale info_boxes_blob = np.vstack ((info_boxes_blob, info_boxes)) # Build The BLOB Parameter # Default TRAIN.ASPECTS = (1,) only one? ? ? (Aspect ratio to use During Training) num_aspect = len (cfg.TRAIN.ASPECTS) NUM = 2 + 2 + 2 * * num_scale num_aspect # 6? # Parameters_blob stores the following parameters # num_scale scale number image scaling len (cfg.TRAIN.SCALES). 1 = # num_aspect aspect ratio using the number len (cfg.TRAIN.ASPECTS). 1 = # cfg.TRAIN.SCALES (600,) # CFG. TRAIN.SCALE_MAPPING not ignore this value by triggering error? ? ? Perhaps this function is not called # cfg.TRAIN.ASPECT_HEIGHTS not ignore this value by triggering error? ? ? # Cfg.TRAIN.ASPECT_WIDTHS not ignore this value by triggering error? ? ? = np.zeros parameters_blob ((NUM), DTYPE = np.float32) parameters_blob [0]= num_scale parameters_blob[1] = num_aspect parameters_blob[2:2+num_scale] = cfg.TRAIN.SCALES parameters_blob[2+num_scale:2+2*num_scale] = cfg.TRAIN.SCALE_MAPPING parameters_blob[2+2*num_scale:2+2*num_scale+num_aspect] = cfg.TRAIN.ASPECT_HEIGHTS parameters_blob[2+2*num_scale+num_aspect:2+2*num_scale+2*num_aspect] = cfg.TRAIN.ASPECT_WIDTHS # For debug visualizations # _vis_minibatch(im_blob, rois_blob, labels_blob, sublabels_blob) blobs = {'data': im_blob, 'info_boxes': info_boxes_blob, 'parameters': parameters_blob} return blobs
2._get_image_blob(roidb)
Incoming roidb Save the image mean, scaling processing to obtain the processed image is stored into processes_ims list, which is passed as a parameter to im_list_to_blob (...) function returns the BLOB image data, is get_minibatch (...) function call, the configuration blobs 'data' field
And roi_data_layer / minibatch.py (target_size only a single scale to zoom) difference between this function is that scaling using multi-scale TRAIN.SCALES_BASE = (0.25, 0.5, 1.0 , 2.0, 3.0), why you want to use multiple scales? No call
DEF _get_image_blob (roidb): "" " . Builds AN BLOB from the INPUT at The ImagesRF Royalty Free AT in at The roidb at The Different Scales" "" NUM_IMAGES = len (roidb) # list of stored image scaling constituted as a parameter im_list_to_blob (. ..) function to obtain image data BLOB processed_ims = [] for I in xrange (NUM_IMAGES): # Read image IM = cv2.imread (roidb [I] [ ' image ' ]) IF roidb [I] [ ' Flipped ' ]: IM = IM [:, :: -. 1 ,:] im_orig = im.astype (np.float32,copy=True) im_orig - = cfg.PIXEL_MEANS # Build Image Pyramid # and roi_data_layer / minibatch.py in _get_image_blob (...) the difference in this! ! ! # Default = TRAIN.SCALES_BASE (0.25, 0.5, 1.0, 2.0, 3.0) # Why here using multi-scale? ? ? (Scales to Compute Real Features) for im_scale in cfg.TRAIN.SCALES_BASE: IM = cv2.resize (im_orig, None, None, = im_scale fx, fy = im_scale, interpolation = cv2.INTER_LINEAR) processed_ims.append (IM) # the Create a blob to hold the input images, blob.py the blob = im_list_to_blob(processed_ims) return blob
3._project_image_blob(im_rois, im_scale_factor)
For rois zoom, no calls
def _project_im_rois(im_rois, im_scale_factor): """Project image RoIs into the rescaled training image.""" rois = im_rois * im_scale_factor return rois
4._get_bbox_regression_labels(bbox_target_data, num_classes)
Expansion N * 5 bbox_targets to N * ( 4 * num_classes) There are only certain types of non-target return 0 (accepted by the network's shape), construct N * (4 * num_classes) of bbox_loss_weights, return bbox_targets and bbox_loss_weights, no calls
# 扩充N*5 bbox_targets to N*(4*num_classes)仅某类有非0的回归目标值 # 构造N*(4*num_classes)的bbox_loss_weights def _get_bbox_regression_labels(bbox_target_data, num_classes): """ Bounding-box regression targets are stored in a compact紧密的,紧凑的 form in the roidb. This function expands those targets into the 4-of-4*K representation used by the network (i.e. only one class has non-zero targets). The loss weights are similarly expanded. Returns: bbox_target_data (ndarray): N x 4K blob of regression targets bbox_loss_weights (ndarray): N x 4K blob of loss weights """ clss = bbox_target_data[:, 0] bbox_targetsNp.zeros = ((clss.size, * num_classes. 4), DTYPE = np.float32) bbox_loss_weights = np.zeros (bbox_targets.shape, DTYPE = np.float32) inds. = Np.where (CLSS> 0) [0] # removed BG for IND in inds.: CLS = CLSS [IND] Start = *. 4 CLS End = Start +. 4 # expanded N * 5 bbox_targets to N * ( 4 * num_classes) has only a certain non-zero target value regression bbox_targets [ IND, Start: End] = bbox_target_data [IND,. 1 :] # Shape of N * (4 * num_classes), corresponding to only a certain value of 1111, the rest 0 bbox_loss_weights[ind, start:end] = [1., 1., 1., 1.] return bbox_targets, bbox_loss_weights
5._vis_minibatch(im_blob, rois_blob, labels_blob, sublabels_blob)
Roi draw rectangle, print-related information, no calls
# Draw roi rectangle, printed information DEF _vis_minibatch (im_blob, rois_blob, labels_blob, sublabels_blob): "" " . A Mini-BATCH the Visualize for the debugging " "" Import matplotlib.pyplot AS PLT for I in xrange (rois_blob.shape [ 0]): # . 1 (roi source index) +4 (roi coordinates) ROIs = rois_blob [I,:] # of the source image index roi im_ind = ROIs [0] roi = ROIs [2 :] IM = im_blob [im_ind, :,:,:.] TRANSPOSE ((. 1, 2 ., 0)) Copy () IM + = cfg.PIXEL_MEANS IM= im[:, :, (2, 1, 0)] im = im.astype(np.uint8) cls = labels_blob[i] subcls = sublabels_blob[i] plt.imshow(im) print 'class: ', cls, ' subclass: ', subcls plt.gca().add_patch( plt.Rectangle((roi[0], roi[1]), roi[2] - roi[0], roi[3] - roi[1], fill=False, edgecolor='r', linewidth=3) ) plt.show()