Teach you to use LabVIEW to implement Mask R-CNN image instance segmentation

foreword

I introduced to you the use of LabVIEW toolkit to achieve image classification and target detection. Today we will take a look at how to use LabVIEW to implement Mask R-CNN image instance segmentation.


1. What is image instance segmentation?

Image instance segmentation (Instance Segmentation) is further refined on the basis of semantic detection (Semantic Segmentation), which separates the foreground and background of objects, and realizes object separation at the pixel level. And the semantic segmentation of the image and the instance segmentation of the image are two different concepts. The semantic segmentation will only distinguish and segment objects of different categories, while the instance segmentation will further segment objects of different instances in the same class.

Some common tasks in computer vision (classification, detection, semantic segmentation, instance segmentation)

insert image description here

2. What is Mask R-CNN

insert image description here

Mask R-CNN is an instance segmentation (Instance segmentation) algorithm that can be used for "target detection", "target instance segmentation", and "target key point detection". Mask R-CNN algorithm steps:

  • First, input a picture you want to process, and then perform the corresponding preprocessing operation, or the preprocessed picture;

  • Input it into a pre-trained neural network (ResNeXt, etc.) to obtain the corresponding feature map;

  • Set a predetermined ROI for each point in the feature map to obtain multiple candidate ROIs;

  • Send these candidate ROIs to the RPN network for binary classification (foreground or background) and BB regression, and filter out some candidate ROIs

  • Then, perform the ROIAlign operation on the remaining ROIs (that is, first match the original image with the pixel of the feature map, and then

  • The feature map corresponds to the fixed feature);

  • Finally, classify these ROIs (N category classification), BB regression and MASK generation (perform FCN operation in each ROI)

3. LabVIEW calls Mask R-CNN image instance segmentation model

1. Mask R-CNN model acquisition and conversion

  • Install pytorch and torchvision

  • Get the model in torchvision (we get the pre-trained model):

model = models.detection.maskrcnn_resnet50_fpn(pretrained=True) 
  • turn onnx

 1 def get_pytorch_onnx_model(original_model):
 2     model=original_model
 3     # define the directory for further converted model save
 4     onnx_model_path = dirname
 5     
 6     # define the name of further converted model
 7     onnx_model_name = "maskrcnn_resnet50.onnx"
 8 ​
 9     # create directory for further converted model
10     os.makedirs(onnx_model_path, exist_ok=True)
11 ​
12     # get full path to the converted model
13     full_model_path = os.path.join(onnx_model_path, onnx_model_name)
14     model.eval()
15 ​
16     x = torch.rand(1, 3, 640, 640)
17     # model export into ONNX format
18     torch.onnx.export(
19         original_model,
20         x,
21         full_model_path,
22         input_names=["input"],
23         output_names=["boxes", "labels", "scores", "masks"],
24         dynamic_axes={"input": [0, 1, 2, 3],"boxes": [0, 1],"labels": [0],"scores": [0],"masks": [0, 1, 2, 3]},
25         verbose=True,opset_version=11
26     )
27 ​
28     return full_model_path

The python code for complete acquisition and model conversion is as follows:

 1 import os
 2 import torch
 3 import torch.onnx
 4 from torch.autograd import Variable
 5 from torchvision import models
 6 ​
 7 dirname, filename = os.path.split(os.path.abspath(__file__))
 8 print(dirname)
 9 ​
10 def get_pytorch_onnx_model(original_model):
11     model=original_model
12     # define the directory for further converted model save
13     onnx_model_path = dirname
14     
15     # define the name of further converted model
16     onnx_model_name = "maskrcnn_resnet50.onnx"
17 ​
18     # create directory for further converted model
19     os.makedirs(onnx_model_path, exist_ok=True)
20 ​
21     # get full path to the converted model
22     full_model_path = os.path.join(onnx_model_path, onnx_model_name)
23     model.eval()
24 ​
25     x = torch.rand(1, 3, 640, 640)
26     # model export into ONNX format
27     torch.onnx.export(
28         original_model,
29         x,
30         full_model_path,
31         input_names=["input"],
32         output_names=["boxes", "labels", "scores", "masks"],
33         dynamic_axes={"input": [0, 1, 2, 3],"boxes": [0, 1],"labels": [0],"scores": [0],"masks": [0, 1, 2, 3]},
34         verbose=True,opset_version=11
35     )
36 ​
37     return full_model_path
38 ​
39 ​
40 model = models.detection.maskrcnn_resnet50_fpn(pretrained=True)
41 print(get_pytorch_onnx_model(model))

2. LabVIEW calls Mask R-CNN (mask rcnn.vi)

Note: The Mask R-CNN model cannot be loaded using OpenCV dnn, because some operators do not support it, so we mainly use the LabVIEW Open Neural Network Interactive Toolkit (ONNX) to load the inference model.

  • onnxruntime calls the onnx model and selects the acceleration method

insert image description here

  • image preprocessing

    insert image description here

  • Executing reasoning The model we use is: maskrcnn_resnet50_fpn, its output has four layers, namely boxes, labels, scores, masks, and the data types are as follows:

insert image description here

  • It can be seen that the type of labels is INT64, so our source code needs "Get_Rresult_int64.vi, index is 1, because labels are the second layer, that is, the subscript is 1;

insert image description here

  • We can use float32 to obtain the other three outputs. Although the data type of masks is uint8, we found during the actual operation that it has actually been normalized, and float32 can also be used.

insert image description here

  • Post-processing and implementing instance segmentation Because there are many post-processing contents, it is directly encapsulated into a subVI, mask_rcnn_post_process.vi, the source code is as follows:

insert image description here

  • The overall program framework is as follows:

insert image description here

insert image description here

  • The instance segmentation results are as follows. We will find that the model takes longer to run than before. Because he not only needs to obtain the area of ​​each object, but also frames the outline of this area. We can see that the five people and the basketball are all framed, and they are segmented with different colors.

insert image description here

3. LabVIEW calls Mask R-CNN to realize real-time image segmentation (mask rcnn_camera.vi)

The overall idea is similar to the strength segmentation of the image detection above, but a camera is used and a loop is added to perform strength segmentation on each frame of the object. The 3080 series graphics card can choose TensorRT to accelerate inference, and the segmentation will be smoother. We found that this model actually tests the number of detections, so if you only segment people, you can choose a cleaner background, and the overall detection speed will be much faster. insert image description hereinsert image description here

4. Mask-RCNN trains its own data set (detection of pedestrians)

1. Preparations

  • The training requires the jupyterlab environment, and students who have not installed it need to install it through pip install jupyterlab

  • If you can't solve the jupyterlab environment, you can use the free gpu environment provided by colab or kaggle for training

  • Training source code: mask-rcnn.ipynb

2. Start training

  • Run this code according to the prompt, automatically or manually download the dependent file dataset and create a dataset parsing class

insert image description here

  • Define the function of single-round training: the network structure directly adopts the existing one in torchvison, and will not be redefined

insert image description here

  • The following output appears to indicate that the training is in progress

insert image description here

  • Modify this file name, change it to your own picture name, and run it to see the training effect

insert image description here

3. Training effect

insert image description here

4. Export ONNX

insert image description here


Summarize

The above is what I want to share with you today. You can follow the WeChat public account:  VIRobotics , and reply to the keyword: Mask R-CNN image instance segmentation source code   to obtain the complete project source code and model of this shared content.

Guess you like

Origin blog.csdn.net/weixin_47367099/article/details/127655901