Mask R-CNN hands-on guide: object detection and segmentation to detect drones

Object detection is a computer vision technology used to identify and locate objects in images. There are many detection algorithms, and here is a good summary.

Mask R-CNN is an extension of target detection. It generates a bounding box and segmentation mask for each target detected in the image. This article is a guide about using Mask R-CNN to train a custom data set. I hope it can help some of you simplify this process.

https://github.com/matterport/Mask_RCNN/blob/master/samples/shapes/train_shapes.ipynb

Libraries and packages

The main package of the algorithm is mrcnn. Download the library and import it into the environment.

!pip install mrcnnfrom mrcnn.config import Config
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
from mrcnn.model import log

mrcnn is not yet compatible with TensorFlow 2.0, so make sure you revert to TensorFlow 1.x. Because I developed on Colab, I will use the magic function to revert to TensorFlow 1.x. This is also where TF is criticized, and compatibility is basically based on changes.

%tensorflow_version 1.x
import tensorflow as tf

In TensorFlow 2.0, tf.random_shuffle was renamed to tf.random.shuffle, which caused incompatibility issues. By changing the shuffle function in the mrcnn code, TensorFlow 2.0 can be used.

It is best to use Colab to put Keras to the previous version. If you encounter an error, do it this way. If not, ignore it.

!pip install keras==2.2.5

Pretreatment

The mrcnn package is quite flexible in the data format received. Here we directly process into NumPy array.

Prior to this, cv2 could not read video17_295 and video19_1900 correctly. Therefore, I filtered out these images and created a list of file names.

dir = "Database1/"# filter out image that cant be read
prob_list = ['video17_295','video19_1900'] # cant read format
txt_list = [f for f in os.listdir(dir) if f.endswith(".txt") and f[:-4] not in prob_list]
file_list = set([re.match("\w+(?=.)",f)[0] for f in txt_list])# create data list as tuple of (jpeg,txt)
data_list = []
for f in file_list:
    data_list.append((f+".JPEG",f+".txt"))

Things to do next

  • Check if the label exists (some images do not contain drones)
  • Read and process images
  • Read and process the coordinates of the bounding box
  • Drawing a bounding box for visualization purposes
X,y = [], []
img_box = []
DIMENSION = 128 # set low resolution to decrease training timefor i in range(len(data_list)):
    # get bounding box and check if label exist
    with open(dir+data_list[i][1],"rb") as f:
    box = f.read().split()
    if len(box) != 5: 
        continue # skip data if does not contain labelbox = [float(s) for s in box[1:]]# read imageimg = cv2.imread(dir+data_list[i][0])
    img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)# resize img to 128 x 128
    img = cv2.resize(img, (DIMENSION,DIMENSION), interpolation= cv2.INTER_LINEAR)# draw bounding box (for visualization purposes)
    resize1, resize2 = img.shape[0]/DIMENSION, img.shape[1]/DIMENSION
    p1,p2,p3,p4 = int(box[0]*img.shape[1]*resize2), int(box[1]*img.shape[0]*resize1) ,int(box[2]*img.shape[1]*resize2) ,int(box[3]*img.shape[0]*resize1)ymin, ymax, xmin, xmax = p2-p4//2, p2+p4//2, p1-p3//2, p1+p3//2draw = cv2.rectangle(img.copy(),(xmax,ymax),(xmin,ymin),color=(255,255,0),thickness =1)# store data if range of y is at least 20 pixels (remove data with small drones)
    if ymax - ymin >=20:
        X.append(img)
        y.append([ymin, ymax, xmin, xmax])
        img_box.append(draw)# convert to numpy arraysX = np.array(X).astype(np.uint8)
y = np.array(y)
img_box = np.array(img_box)

Before converting to a NumPy array, I obtain a subset of the data set as a test to reduce training time.

If you have computing power, you can omit it.

The following is a picture:

MRCNN processing

Now let's take a look at mrcnn itself, we need to define a mrcnn dataset class before the training process. This data set class provides information about the image, such as the position of the class and object it belongs to. mrcnn.utils contains this class

Things here are a bit tricky and you need to read some source code. These are the functions you need to modify:

https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/utils.py

  • add_class, used to determine the number of classes of the model

  • Add an image, define the image id and image path (if applicable)

  • Load image, where image data is loaded

  • Load the mask to get information about the mask / border of the image

# define drones dataset using mrcnn utils classclass DronesDataset(utils.Dataset):
    def __init__(self,X,y): # init with numpy X,y
        self.X = X
        self.y = y
        super().__init__()def load_dataset(self):
        self.add_class("dataset",1,"drones") # only 1 class, drones
        for i in range(len(self.X)):
            self.add_image("dataset",i,path=None)def load_image(self,image_id):
        image = self.X[image_id] # where image_id is index of X
        return imagedef load_mask(self,image_id):
    # get details of image
    info = self.image_info[image_id]
    #create one array for all masks, each on a different channel
    masks = np.zeros([128, 128, len(self.X)], dtype='uint8')class_ids = []
    for i in range(len(self.y)):
        box = self.y[info["id"]]
        row_s, row_e = box[0], box[1]
        col_s, col_e = box[2], box[3]
        masks[row_s:row_e, col_s:col_e, i] = 1 # create mask with similar boundaries as bounding box
        class_ids.append(1)return masks, np.array(class_ids).astype(np.uint8)

We have formatted the image as a NumPy array, so we can simply initialize the Dataset class with the array and load the image and bounding box by indexing into the array.

Next, split the training and test sets.

# train test split 80:20np.random.seed(42) # for reproducibility
p = np.random.permutation(len(X))
X = X[p].copy()
y = y[p].copy()split = int(0.8 * len(X))X_train = X[:split]
y_train = y[:split]X_val = X[split:]
y_val = y[split:]

Now load the data into the dataset class.

# load dataset into mrcnn dataset classtrain_dataset = DronesDataset(X_train,y_train)
train_dataset.load_dataset()
train_dataset.prepare()val_dataset = DronesDataset(X_val,y_val)
val_dataset.load_dataset()
val_dataset.prepare()

The prepare () function uses the image ID and class ID information to prepare data for the mrcnn model. Below is the modification of the config class that we imported from mrcnn. The Config class determines the variables used in training and should be adjusted according to the data set.

The variables below are not exhaustive, you can refer to the complete list in the documentation.

class DronesConfig(Config):
    # Give the configuration a recognizable name
    NAME = "drones"# Train on 1 GPU and 2 images per GPU.
    GPU_COUNT = 1
    IMAGES_PER_GPU = 2# Number of classes (including background)
    NUM_CLASSES = 1+1  # background + drones# Use small images for faster training. 
    IMAGE_MIN_DIM = 128
    IMAGE_MAX_DIM = 128# Reduce training ROIs per image because the images are small and have few objects.
    TRAIN_ROIS_PER_IMAGE = 20# Use smaller anchors because our image and objects are small
    RPN_ANCHOR_SCALES = (8, 16, 32, 64, 128)  # anchor side in pixels# set appropriate step per epoch and validation step
    STEPS_PER_EPOCH = len(X_train)//(GPU_COUNT*IMAGES_PER_GPU)
    VALIDATION_STEPS = len(X_val)//(GPU_COUNT*IMAGES_PER_GPU)# Skip detections with < 70% confidence
    DETECTION_MIN_CONFIDENCE = 0.7config = DronesConfig()
config.display()

Depending on your computing power, you may need to adjust these variables accordingly. Otherwise, you will face the problem of stuck in "Epoch 1" and no error message will be given. There are even GitHub issues raised for this issue, and many solutions have been proposed. If you encounter this situation, be sure to check and test some of these suggestions.

https://github.com/matterport/Mask_RCNN/issues/287

MRCNN training

mrcnn was trained through COCO and ImageNet datasets. So as long as these pre-trained weights are used for migration learning, we need to download them to the environment (remember to define the root directory first)

# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)

Create a model and use pre-trained weights.

# Create model in training mode using gpuwith tf.device("/gpu:0"):
    model = modellib.MaskRCNN(mode="training", config=config,model_dir=MODEL_DIR)# Which weights to start with?
init_with = "imagenet"  # imagenet, cocoif init_with == "imagenet":
    model.load_weights(model.get_imagenet_weights(), by_name=True)
elif init_with == "coco":
    # Load weights trained on MS COCO, but skip layers that
    # are different due to the different number of classes
    # See README for instructions to download the COCO weights
    model.load_weights(COCO_MODEL_PATH, by_name=True,exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"])

Now, we can start the actual training.

model.train(train_dataset, val_dataset,learning_rate=config.LEARNING_RATE,epochs=5,layers='heads') # unfreeze head and just train on last layer

I only train the last layer to detect drones in the data set. If time permits, you should also fine-tune the model by training all previous layers.

model.train(train_dataset, val_dataset, 
            learning_rate=config.LEARNING_RATE / 10,
            epochs=2, 
            layers="all")

After completing the training of the mrcnn model. You can use these two lines of code to save the weight of the model.

# save weights
model_path = os.path.join(MODEL_DIR, "mask_rcnn_drones.h5")
model.keras_model.save_weights(model_path)

MRCNN inference

To reason about other pictures, you need to create a new reasoning model with a custom configuration.

# make inferenceclass InferenceConfig(DronesConfig):
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1inference_config = InferenceConfig()# Recreate the model in inference mode
model = modellib.MaskRCNN(mode="inference",config=inference_config, model_dir=MODEL_DIR)# Load trained weightsmodel_path = os.path.join(MODEL_DIR, "mask_rcnn_drones.h5")
model.load_weights(model_path, by_name=True)

Visualization

def get_ax(rows=1, cols=1, size=8):
    _, ax = plt.subplots(rows, cols, figsize=(size*cols, size*rows))return ax# Test on a random image
image_id = random.choice(val_dataset.image_ids)
original_image, image_meta, gt_class_id, gt_bbox, gt_mask =\
modellib.load_image_gt(val_dataset, inference_config,image_id, use_mini_mask=False)results = model.detect([original_image], verbose=1)
r = results[0]visualize.display_instances(original_image, r['rois'], r['masks'], r['class_ids'],val_dataset.class_names, r['scores'], ax=get_ax())

Insert picture description here

Well, we have trained an mrcnn model with a custom data set.

Author: Benjamin Lau

Published 40 original articles · Like 102 · Visits 120,000+

Guess you like

Origin blog.csdn.net/m0_46510245/article/details/105532477