Matlab simulation of people detection algorithm based on Mask-RCNN deep learning network

Table of contents

1. Overview of algorithm theory

2. Some core programs

3. Algorithm running software version

4. Algorithm operation rendering preview

5. Algorithm complete program engineering


1. Overview of algorithm theory

        The human detection algorithm based on Mask-RCNN deep learning network is a method for detecting human targets in images. The algorithm combines the capabilities of object detection and instance segmentation to accurately locate human objects and generate pixel-level masks. Mask-RCNN is an object detection algorithm based on deep learning, which is extended on the basis of Faster-RCNN. Mask-RCNN predicts the semantic segmentation mask of each candidate box by adding a Mask Head network, so as to achieve accurate segmentation and recognition of objects. The Mask-RCNN network mainly consists of two parts: Region Proposal Network (RPN) and Mask Head. RPN is used to generate candidate boxes, and Mask Head is used to predict the semantic segmentation mask of each candidate box. RPN first performs convolution and pooling operations on the input image, and then generates a set of candidate boxes, which are used for subsequent target detection and segmentation. The Mask Head network accepts the candidate boxes generated by RPN as input, and then predicts the semantic segmentation mask for each candidate box, and finally outputs the category and mask of each object.

      The basic structure of MASK-RCNN is shown in the figure below:

The implementation steps of MASK-RCNN are as follows: 

  1. Data Preparation First, a training dataset needs to be prepared, including person images with labeled boxes and masks. Also, class labels need to be defined, such as "person" and "background".

  2. Network Architecture Mask-RCNN is a deep learning model based on convolutional neural network (CNN). It consists of two subnetworks: Region Proposal Network (RPN) and Mask Subnetwork. RPN is used to generate candidate object boxes, while the mask subnetwork is used to generate pixel-level masks of objects. The two sub-networks share feature extraction layers for computational efficiency.

  3. Object detection First, the RPN is used to generate candidate object boxes. RPN generates multiple candidate boxes through sliding windows and anchor boxes, and assigns a score to each box. This process can be implemented using convolutional and fully connected layers. Boxes with scores higher than a threshold are retained as object candidates. The probability of the RPN network outputting the candidate box:
    $$P_{object} = \frac{1}{1 + e^{-r}}$$
    Among them, $r$ is a feature vector, which is used to indicate that each candidate box is Probability of foreground or background.

  4. Object Classification For each candidate object box, use a classifier to classify it. This can be achieved with convolutional and fully connected layers. The classifier assigns each box to a predefined class label, such as "person" and "background".

  5. Object Localization For each candidate object box, a regressor is used to adjust the box’s position to more accurately match the object’s bounds. The position offset of the regressor-generated box, including horizontal and vertical displacement and scaling of width and height.

  6. Instance Segmentation For each candidate object box, a mask subnetwork is used to generate a pixel-wise mask of the object. The mask sub-network generates dense pixel masks by performing ROI pooling and convolution operations on the candidate boxes. The mask represents the precise location of the object in the image.

  7. Loss function During the training phase, a loss function is used to measure the accuracy of object detection and mask prediction. Commonly used loss functions include cross-entropy loss and smooth L1 loss. Mask Head's loss function:
    $$L_{mask} = \frac{1}{N_{mask}}\sum_{i=1}^{N_{mask}}L_{binary}(m_i,\hat{m_i} ) + \frac{1}{N_{cls}}\sum_{i=1}^{N_{cls}}L_{cls}(c_i,\hat{c_i})$$ Among them, $L_{binary}
    $ Is the binary cross-entropy loss function, $L_{cls}$ is the softmax loss function, $m_i$ and $\hat{m_i}$ represent the real mask and predicted mask, respectively, $c_i$ and $\hat{c_i}$ denote the true and predicted categories, respectively.

  8. Network training uses the training dataset to train the Mask-RCNN network. Through backpropagation and gradient descent algorithms, network parameters are optimized to minimize the loss function. Data augmentation techniques can be used during training to increase the diversity of data samples.

  9. Object Detection and Segmentation In the test phase, the trained network is applied to new images. Through forward propagation, object detection and instance segmentation operations are performed on the image. According to the box and mask information output by the model, the human target can be accurately located and a pixel-level mask can be generated. Ask Head's mask prediction:
    $$\hat{m} {h,w} = \frac{1}{Z}\sum {(u,v)\in R} f_{u,v}(h,w )$$
    Among them, $Z$ is a normalization factor, $R$ is the area of ​​the candidate box, and $f_{u,v}$ represents the feature vector of the (u,v) position on the feature map.

        The difficulty in implementing the Mask-RCNN algorithm lies in the design of the network architecture and the optimization of the training process. The design of the network needs to choose the structure of the convolutional and fully connected layers reasonably, and consider the detection ability of objects of different scales. The training process needs to select an appropriate optimization algorithm and learning rate scheduling strategy, and perform appropriate data enhancement and regularization to improve the generalization ability and robustness of the network. In addition, the selection of a suitable loss function and the trade-off between the importance of object detection and instance segmentation tasks are also challenges.

2. Some core programs

for i = 1:20
    img = imread(file_list(i).name);% 读取图像
    imgSize      = size(img);% 获取图像尺寸
    [~, maxDim]  = max(imgSize); % 获取最大尺寸维度
    resizeSize   = [NaN NaN]; % 调整后的尺寸
    resizeSize(maxDim) = targetSize(maxDim);% 按目标尺寸调整尺寸
    
    img          = imresize(img, resizeSize);% 调整图像尺寸
    % 进行 Mask RCNN 检测
    [boxes, scores, labels, masks] = detectMaskRCNN(net, maskSubnet, img, params, SimuEnv);
    
     
    if(isempty(masks))
        overlayedImage = img;% 如果未检测到掩膜,使用原始图像
        NAME='未检测到人员'% 名称为未检测到人员
    else
        overlayedImage = insertObjectMask(img, masks);% 将掩膜绘制在图像上
        NAME='检测到人员'% 名称为检测到人员
    end
    figure
    imshow(overlayedImage) % 显示处理后的图像
    showShape("rectangle", gather(boxes), "Label", labels, "LineColor",'g')% 显示边界框和标签
    title(NAME);% 设置标题
end
0024

3. Algorithm running software version

matlab2022a

4. Algorithm operation rendering preview

 

 

 

 

 

 

5. Algorithm complete program engineering

OOOOO

OOO

O

Guess you like

Origin blog.csdn.net/aycd1234/article/details/131733336