foreword
I introduced to you the use of LabVIEW toolkit to achieve image classification and target detection. Today we will take a look at how to use LabVIEW to implement Mask R-CNN image instance segmentation.
1. What is image instance segmentation?
Image instance segmentation (Instance Segmentation) is further refined on the basis of semantic detection (Semantic Segmentation), which separates the foreground and background of objects, and realizes object separation at the pixel level. And the semantic segmentation of the image and the instance segmentation of the image are two different concepts. The semantic segmentation will only distinguish and segment objects of different categories, while the instance segmentation will further segment objects of different instances in the same class.
Some common tasks in computer vision (classification, detection, semantic segmentation, instance segmentation)
2. What is Mask R-CNN
Mask R-CNN is an instance segmentation (Instance segmentation) algorithm that can be used for "target detection", "target instance segmentation", and "target key point detection". Mask R-CNN algorithm steps:
-
First, input a picture you want to process, and then perform the corresponding preprocessing operation, or the preprocessed picture;
-
Input it into a pre-trained neural network (ResNeXt, etc.) to obtain the corresponding feature map;
-
Set a predetermined ROI for each point in the feature map to obtain multiple candidate ROIs;
-
Send these candidate ROIs to the RPN network for binary classification (foreground or background) and BB regression, and filter out some candidate ROIs
-
Then, perform the ROIAlign operation on the remaining ROIs (that is, first match the original image with the pixel of the feature map, and then
-
The feature map corresponds to the fixed feature);
-
Finally, classify these ROIs (N category classification), BB regression and MASK generation (perform FCN operation in each ROI)
3. LabVIEW calls Mask R-CNN image instance segmentation model
1. Mask R-CNN model acquisition and conversion
-
Install pytorch and torchvision
-
Get the model in torchvision (we get the pre-trained model):
model = models.detection.maskrcnn_resnet50_fpn(pretrained=True)
-
turn onnx
1 def get_pytorch_onnx_model(original_model): 2 model=original_model 3 # define the directory for further converted model save 4 onnx_model_path = dirname 5 6 # define the name of further converted model 7 onnx_model_name = "maskrcnn_resnet50.onnx" 8 9 # create directory for further converted model 10 os.makedirs(onnx_model_path, exist_ok=True) 11 12 # get full path to the converted model 13 full_model_path = os.path.join(onnx_model_path, onnx_model_name) 14 model.eval() 15 16 x = torch.rand(1, 3, 640, 640) 17 # model export into ONNX format 18 torch.onnx.export( 19 original_model, 20 x, 21 full_model_path, 22 input_names=["input"], 23 output_names=["boxes", "labels", "scores", "masks"], 24 dynamic_axes={"input": [0, 1, 2, 3],"boxes": [0, 1],"labels": [0],"scores": [0],"masks": [0, 1, 2, 3]}, 25 verbose=True,opset_version=11 26 ) 27 28 return full_model_path
The python code for complete acquisition and model conversion is as follows:
1 import os 2 import torch 3 import torch.onnx 4 from torch.autograd import Variable 5 from torchvision import models 6 7 dirname, filename = os.path.split(os.path.abspath(__file__)) 8 print(dirname) 9 10 def get_pytorch_onnx_model(original_model): 11 model=original_model 12 # define the directory for further converted model save 13 onnx_model_path = dirname 14 15 # define the name of further converted model 16 onnx_model_name = "maskrcnn_resnet50.onnx" 17 18 # create directory for further converted model 19 os.makedirs(onnx_model_path, exist_ok=True) 20 21 # get full path to the converted model 22 full_model_path = os.path.join(onnx_model_path, onnx_model_name) 23 model.eval() 24 25 x = torch.rand(1, 3, 640, 640) 26 # model export into ONNX format 27 torch.onnx.export( 28 original_model, 29 x, 30 full_model_path, 31 input_names=["input"], 32 output_names=["boxes", "labels", "scores", "masks"], 33 dynamic_axes={"input": [0, 1, 2, 3],"boxes": [0, 1],"labels": [0],"scores": [0],"masks": [0, 1, 2, 3]}, 34 verbose=True,opset_version=11 35 ) 36 37 return full_model_path 38 39 40 model = models.detection.maskrcnn_resnet50_fpn(pretrained=True) 41 print(get_pytorch_onnx_model(model))
2. LabVIEW calls Mask R-CNN (mask rcnn.vi)
Note: The Mask R-CNN model cannot be loaded using OpenCV dnn, because some operators do not support it, so we mainly use the LabVIEW Open Neural Network Interactive Toolkit (ONNX) to load the inference model.
-
onnxruntime calls the onnx model and selects the acceleration method
-
image preprocessing
-
Executing reasoning The model we use is: maskrcnn_resnet50_fpn, its output has four layers, namely boxes, labels, scores, masks, and the data types are as follows:
-
It can be seen that the type of labels is INT64, so our source code needs "Get_Rresult_int64.vi, index is 1, because labels are the second layer, that is, the subscript is 1;
-
We can use float32 to obtain the other three outputs. Although the data type of masks is uint8, we found during the actual operation that it has actually been normalized, and float32 can also be used.
-
Post-processing and implementing instance segmentation Because there are many post-processing contents, it is directly encapsulated into a subVI, mask_rcnn_post_process.vi, the source code is as follows:
-
The overall program framework is as follows:
-
The instance segmentation results are as follows. We will find that the model takes longer to run than before. Because he not only needs to obtain the area of each object, but also frames the outline of this area. We can see that the five people and the basketball are all framed, and they are segmented with different colors.
3. LabVIEW calls Mask R-CNN to realize real-time image segmentation (mask rcnn_camera.vi)
The overall idea is similar to the strength segmentation of the image detection above, but a camera is used and a loop is added to perform strength segmentation on each frame of the object. The 3080 series graphics card can choose TensorRT to accelerate inference, and the segmentation will be smoother. We found that this model actually tests the number of detections, so if you only segment people, you can choose a cleaner background, and the overall detection speed will be much faster.
4. Mask-RCNN trains its own data set (detection of pedestrians)
1. Preparations
-
The training requires the jupyterlab environment, and students who have not installed it need to install it through pip install jupyterlab
-
If you can't solve the jupyterlab environment, you can use the free gpu environment provided by colab or kaggle for training
-
Training source code: mask-rcnn.ipynb
2. Start training
-
Run this code according to the prompt, automatically or manually download the dependent file dataset and create a dataset parsing class
-
Define the function of single-round training: the network structure directly adopts the existing one in torchvison, and will not be redefined
-
The following output appears to indicate that the training is in progress
-
Modify this file name, change it to your own picture name, and run it to see the training effect
3. Training effect
4. Export ONNX
Summarize
The above is what I want to share with you today. You can follow the WeChat public account: VIRobotics , and reply to the keyword: Mask R-CNN image instance segmentation source code to obtain the complete project source code and model of this shared content.