Target detection system based on YOLOv4 (with MATLAB code + GUI implementation)

Abstract: This article introduces a target detection system code implemented by MATLAB, using YOLOv4 detection network as the core model, which is used to train and detect targets under various tasks, and visualize various target detection results in the GUI interface. The article introduces the implementation process of YOLOv4 in detail, including the algorithm principle, MATLAB implementation code, training data set, training process and graphical user interface. In the GUI interface, users can select various pictures, videos, and cameras for detection and recognition, and the detection model can be replaced. This article provides a complete MATLAB code and usage tutorial, which is suitable for new friends to refer to. For the complete code resource file, please refer to the download link at the end of the article.

Target detection system based on YOLOv4 - MATLAB GUI version - demonstration and introduction

➷Click to jump to the download page of all the complete code files involved at the end of the article☇


1 Introduction

        When bloggers started learning artificial intelligence ten years ago, they often found that there were very few high-quality complete tutorials or blogs available for reference on the Internet. There was basically no reference to implement a complex code, and they were often groping around. The original intention of writing this blog is to share technical knowledge and provide inspiration for beginners. I hope that through the examples and explanations of the blog, it will stimulate readers' interest and enthusiasm, and help them better understand and apply related technologies. As the so-called "knowledge and determination, earnest inquiry and deep thinking", I also hope that readers will not stop thinking in the process of reading blogs, and try to solve problems by themselves and put forward new viewpoints and ideas after mastering the basic principles and techniques. In the process of learning, you may encounter challenges and difficulties. The resolution of a bug may be the time to improve your skills and expand the boundaries of knowledge. The content of this blog is original by the blogger, and I have marked the relevant citations and references in the text. Considering that relevant professionals may see it, the blogger’s blog is written in the format of an academic journal as much as possible. If you need reference, you can cite this The blog format is as follows:

[1] Unlimited thoughts. Target detection system based on YOLOv4 (with MATLAB code + GUI implementation) [J/OL]. CSDN, 2023.05. https://wuxian.blog.csdn.net/article/details/130470598. [2
] ] Wu, S. (2023, May). Object Detection System Based on YOLOv4 (with MATLAB Code and GUI Implementation) [J/OL]. CSDN. https://wuxian.blog.csdn.net/article/details/130470598 .

        Object detection, as an important research direction in the field of computer vision, aims to identify and localize specific categories of objects from images or videos (Redmon et al., 2016) [1]. Over the past few years, with the development of deep learning techniques, object detection methods based on convolutional neural networks (CNN) have achieved remarkable progress. Some classic object detection methods include R-CNN (Girshick et al., 2014) [2], Fast R-CNN (Girshick, 2015) [3], Faster R-CNN (Ren et al., 2015) [4] , SSD (Liu et al., 2016) [5] and RetinaNet (Lin et al., 2017) [6]. These methods have achieved good results on various benchmark datasets, such as PASCAL VOC (Everingham et al., 2010) [7], COCO (Lin et al., 2014) [8] and ImageNet (Russakovsky et al. , 2015) [9]. The YOLO series of algorithms (Redmon et al., 2016; Redmon & Farhadi, 2017; Redmon & Farhadi, 2018; Bochkovskiy et al., 2020) [1, 10-12] pay more attention to detection speed and real-time performance than other methods , so it has great advantages in many practical application scenarios.

        Although the above methods have achieved remarkable results in the field of object detection, each method has certain limitations. For example, the R-CNN series methods are excellent in detection accuracy, but the computational complexity is high, resulting in slow detection speed (Girshick et al., 2014; Girshick, 2015; Ren et al., 2015) [2-4] . In contrast, one-stage detection methods such as SSD and RetinaNet have improved in detection speed but relatively low accuracy (Liu et al., 2016; Lin et al., 2017) [5, 6]. The YOLO series of algorithms have achieved a good balance between detection speed and accuracy (Redmon et al., 2016; Redmon & Farhadi, 2017; Redmon & Farhadi, 2018; Bochkovskiy et al., 2020) [1, 10-12] . In particular, the YOLOv4 algorithm has become an important method in the field of target detection due to its high detection accuracy and speed (Bochkovskiy et al., 2020) [12].

        At present, many researchers and engineers have successfully applied YOLOv4 to various practical scenarios, such as unmanned driving (Geiger et al., 2012) [13], video surveillance (Sindhu et al., 2021) [14], medical Image (Shewajo et al., 2023) [15] and so on. However, despite the impressive results of YOLOv4 in object detection tasks, there are still relatively few related studies on the implementation of YOLOv4 in the MATLAB environment (MathWorks, 2021) [16]. The YOLOv4 target detection system based on MATLAB has strong practicability and can provide convenient development and debugging tools for researchers and engineers in the fields of computer vision and image processing. Therefore, the main contributions of this blog are as follows:

  1. Provide a YOLOv4 target detection system based on MATLAB, which has a user-friendly interface and supports multiple detection modes, such as image detection, batch detection, video detection and real-time camera detection;
  2. Introduce in detail the data set format required to prepare the YOLOv4 model training in the MATLAB environment, and give an example of a custom animal recognition data set;
  3. Provide the training code of the YOLOv4 model, and demonstrate its performance through the training curve and model evaluation results;
  4. Combined with the GUI interface, the design framework and realization principle of the system are elaborated.

2. System interface demonstration effect

        In order to facilitate the user's target detection, we developed a YOLOv4 target detection system with a user-friendly interface based on MATLAB. The system supports the following functions:

(1) Select image detection: The user can select a single image for object detection, and the system will recognize the object in the image and mark the bounding box and category of the object on the image.

insert image description here

(2) Select a folder for batch detection: the user can select a folder for batch detection, and the system will automatically identify all the pictures in the folder and save the detection results to the specified output folder.

insert image description here

(3) Select video detection: The user can select a video file for object detection, and the system will recognize the object in the video in real time and mark the bounding box and category of the object on the video screen.

insert image description here

(4) Invoke camera detection: The user can enable the computer camera for real-time object detection, and the system will recognize the objects in the picture captured by the camera in real time, and mark the bounding box and category of the object on the picture.

insert image description here

(5) Replacement of different network models: Users can choose different YOLOv4 pre-training models for target detection according to their needs, so as to adapt to different detection tasks and performance requirements.

insert image description here

(6) Display results and visualization through the interface: the system interface will intuitively display the detection results, including the bounding box, category and confidence score of the object. At the same time, users can view the visualization of the detection process through the interface to better understand the detection performance of the model.

insert image description here


3. Dataset Format Introduction

        To train the YOLOv4 model in the MATLAB environment, you first need to prepare a suitable data set. The dataset should contain a large number of labeled images so that the trained model can learn to recognize different classes of objects. This section will introduce in detail the dataset annotation file format required for YOLOv4 model training officially supported by MATLAB, and how to create a custom animal recognition dataset as an example.

        In MATLAB, the dataset annotation information required for YOLOv4 training is stored in the table type. Each table row represents a sample (that is, a picture), and each column corresponds to a specific piece of information. The first column is the path of the image file, and starting from the second column, each column corresponds to a specific category of annotation information. The annotation information for each category includes the coordinates of the bounding box under that category. If there are multiple bounding boxes in an image belonging to the same class, use a 2D array to represent these bounding boxes. If a category does not appear in the picture, it is represented by an empty array ([]).

Taking the custom animal recognition data set as an example, you can see that the structure of the data set is as follows:

insert image description here
        In this example, there are 6 classes: bird, cat, cow, dog, horse, and sheep. The annotation information for each category includes the upper-left coordinates (x, y) of the bounding box and the width and height (w, h) of the bounding box.


4. Model training code

        In this section, we will introduce how to use MATLAB to train the YOLOv4 model. We will use the custom animal recognition dataset prepared in the previous section. First, you need to load the data for the training set, validation set, and test set, and add the full path to the image file. Here is the MATLAB code to load the dataset:

% 加载数据集
data = load("data/Animal_dataset_train.mat");
trainData = data.Dataset;  % 训练集

data = load("data/Animal_dataset_val.mat");
validData = data.Dataset;  % 验证集

data = load("data/Animal_dataset_test.mat");
testData = data.Dataset;  % 测试集

% 为数据集添加完整路径
dataDir = fullfile(pwd, 'data');
trainData.imageFilename = fullfile(dataDir, trainData.imageFilename);
validData.imageFilename = fullfile(dataDir, validData.imageFilename);
testData.imageFilename = fullfile(dataDir, testData.imageFilename);

        Next, create datastores using imageDatastore and boxLabelDatastore to load image and label data during training and evaluation.

% 创建数据存储
imdsTrain = imageDatastore(trainData{
    
    :,"imageFilename"});
bldsTrain = boxLabelDatastore(trainData(:, 2:end));
imdsValidation = imageDatastore(validData{
    
    :,"imageFilename"});
bldsValidation = boxLabelDatastore(validData(:, 2:end));
imdsTest = imageDatastore(testData{
    
    :,"imageFilename"});
bldsTest = boxLabelDatastore(testData(:, 2:end));

% 整合图片和标签
trainingData = combine(imdsTrain, bldsTrain);
validationData = combine(imdsValidation, bldsValidation);
testData = combine(imdsTest, bldsTest);

        In order to train the YOLOv4 model, the input image needs to be resized and the anchor boxes are estimated based on the number of anchor boxes.

inputSize = [320 224 3];  % 输入尺寸
classes = {
    
    'bird', 'cat',  'cow',  'dog', 'horse', 'sheep'};
numAnchors = 6;

% 预处理数据
trainingDataForEstimation = transform(trainingData, @(data)preprocessData(data, inputSize));
[anchors, meanIoU] = estimateAnchorBoxes(trainingDataForEstimation, numAnchors);

% 计算每层的锚框
area = anchors(:,1) .* anchors(:,2);
[~, idx] = sort(area, "descend");
anchors = anchors(idx, :);
anchorBoxes = {
    
    anchors(1:3, :); anchors(4:6, :)};

        Next create a YOLOv4 object detector using the pretrained YOLOv4 detection network trained on the COCO dataset. Before that, data augmentation methods such as random horizontal flipping, random scaling, and color transformation can be applied selectively. Then, set training parameters such as learning rate, batch size, and maximum number of iterations, etc.

% 使用 COCO 数据集上训练的预训练 YOLO v4 检测网络创建YOLOv4对象检测器
detector = yolov4ObjectDetector("tiny-yolov4-coco",classes,anchorBoxes,InputSize=inputSize);

if flag_augment  % 进行数据增强
    augmentedTrainingData = transform(trainingData, @augmentData);  % 为数据配置增强操作
    % 展示增强效果
    augmentedData = cell(4,1);
    for k = 1:4
        data = read(augmentedTrainingData);
        augmentedData{
    
    k} = insertShape(data{
    
    1},"rectangle",data{
    
    2});
        reset(augmentedTrainingData);
    end
    figure
    montage(augmentedData,BorderSize=10)  % 演示数据增强效果
end

% 训练参数设置
options = trainingOptions("adam", ...
    ExecutionEnvironment=exe_env,...
    GradientDecayFactor=0.9,...
    SquaredGradientDecayFactor=0.999,...
    InitialLearnRate=0.001,...
    LearnRateSchedule="none",...
    MiniBatchSize=16,...
    L2Regularization=0.0005,...
    MaxEpochs=300,...
    BatchNormalizationStatistics="moving",...
    DispatchInBackground=true,...
    ResetInputNormalization=false,...
    Shuffle="every-epoch",...
    VerboseFrequency=20,...
    CheckpointPath='./checkPoint/',...
    CheckpointFrequency=10, ...
    ValidationData=validationData, ...
    OutputNetwork='best-validation-loss' ...
    );

% options = trainingOptions("sgdm", ...
%     ExecutionEnvironment=exe_env, ...
%     InitialLearnRate=0.001, ...
%     MiniBatchSize=16,...
%     MaxEpochs=300, ...
%     BatchNormalizationStatistics="moving",...
%     ResetInputNormalization=false,...
%     VerboseFrequency=30);

% 执行训练程序
if doTraining       
    % Train the YOLO v4 detector.
    if flag_augment  % 是否数据增强
        [detector,info] = trainYOLOv4ObjectDetector(augmentedTrainingData,detector,options);
    else
        if if_checkPoint  % 是否使用checkpoint
            load(checkpoint_path);
            [detector,info] = trainYOLOv4ObjectDetector(trainingData, net, options);
        else
            [detector,info] = trainYOLOv4ObjectDetector(trainingData,detector,options);
        end
    end
else
    % 否则使用预训练模型
    pretrained = load('yolov4_tiny.mat');
    detector = pretrained.detector;
end

        The output information during the training process is as follows:

*************************************************************************
Training a YOLO v4 Object Detector for the following object classes:
* bird
* cat
* cow
* dog
* horse
* sheep
 
    Epoch    Iteration    TimeElapsed    LearnRate    TrainingLoss    ValidationLoss
    _____    _________    ___________    _________    ____________    ______________
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 6).
      1         20         00:01:40        0.001         62.356                     
      1         40         00:01:48        0.001         25.72                      
      1         60         00:02:01        0.001         19.095                     
      1         80         00:02:08        0.001         21.819                     
      2         100        00:02:29        0.001         14.169          0.67991    
      2         120        00:02:43        0.001         19.108                     
 ...

        In this process, first choose whether to perform data augmentation. Then, set training parameters such as execution environment (GPU), learning rate, batch size, maximum number of iterations, etc. Next, train according to your choice or use a pretrained model. After training is complete, save the model as animal_tiny_yolov4.mat. Finally, use the trained model to detect on the test set, evaluate the detection accuracy, and save the test results and training curves. The entire training process involves steps such as preprocessing data, data enhancement, setting training parameters, performing training, and evaluating detection accuracy. The above code example shows how to use MATLAB to implement these steps to achieve the training and evaluation of the YOLOv4 model.


5. System Implementation

        In this section, we will detail how to combine the YOLOv4 object detector with a graphical user interface (GUI) to implement a friendly and easy-to-use animal recognition system. Combined with GUI, users can upload pictures, select model parameters, and view recognition results more conveniently. The following is the design framework and implementation principles. System implementation mainly includes the following parts:

  1. Graphical User Interface (GUI): Provides an interface for users to interact with the system, including functions such as image upload, model parameter selection, and result display.
  2. Image processing module: preprocess the images uploaded by users to meet the input requirements of the YOLOv4 model.
  3. Detector module: Use the trained YOLOv4 animal object detector for animal category recognition.
  4. Result processing module: post-processing the detection results so as to display them on the GUI.

insert image description here

        Our GUI design aims to provide users with a simple and intuitive operation interface. The main elements include menu bar, image display area, parameter setting area and result display area. Arrange these elements in a compact and orderly manner so that users can easily upload pictures, set parameters and view results. Following are the main controls involved in the GUI:

  1. Image select button: When the user clicks this button, a file chooser opens for the user to select an image. After the picture is selected, the picture will be displayed in the image display coordinate system.
  2. Video selection button: When the user clicks this button, the file chooser opens to allow the user to select a video file. After the video is selected, the video will be played in the image display coordinate system and the recognition result will be displayed in real time.
  3. Camera on button: After the user clicks this button, the system will turn on the computer camera and capture the video stream in real time. The captured video will be displayed in the image display coordinate system and the recognition results will be displayed in real time.
  4. Replace model button: After the user clicks this button, the system will pop up a dialog box to allow the user to select a new model file. After the new model is selected, the system will use the new model for subsequent recognition tasks.
  5. Image display coordinate system: used to display user uploaded pictures, selected videos or captured camera video streams in real time, and display recognition results on images.
  6. Result display area: used to display the detected animal category, confidence level and other information. Users can view the recognition results in this area.

        In order to realize the interactive function of GUI, a series of callback functions need to be written. Following are the main callback functions and their functions:

insert image description here

  1. Picture selection callback function: When the user clicks the picture selection button, this function will be triggered. It is responsible for opening the file chooser, allowing the user to select an image, and displaying the image in the image display coordinate system.
  2. Video selection callback function: When the user clicks the video selection button, this function will be triggered. It is responsible for opening the file selector, allowing the user to select a video file, and playing the video in the image display coordinate system while displaying the recognition results in real time.
  3. Camera open callback function: When the user clicks the camera open button, this function will be triggered. It is responsible for turning on the computer camera, capturing the video stream and displaying the recognition results in real time in the image display coordinate system.
  4. Change model callback function: When the user clicks the change model button, this function will be triggered. It is responsible for popping up a dialog box for the user to select a new model file and apply the new model to subsequent recognition tasks.

        Through the above design, an easy-to-use and fully functional graphical user interface is realized. Through this interface, users can conveniently upload pictures, select videos, turn on the camera, replace the model and view the results, so as to realize the animal recognition task.


5. Summary and Outlook

        This paper mainly introduces a target detection system based on YOLOv4. First, the annotation format and preprocessing process of the dataset are elaborated, including image annotation, data division, etc. Then, an animal recognition model was constructed using the YOLOv4 detection network, and key links such as parameter setting, anchor frame estimation, and data enhancement in the training process were described in detail. Subsequently, the key technologies of system implementation are discussed, including network design, GUI design, etc., and a MATLAB-based graphical user interface is presented, which is convenient for users to perform animal recognition tasks.

        Although the animal recognition system proposed in this paper has achieved good results, there are still some areas that can be improved and optimized. In future research, we will focus on the following aspects:

  1. Richer datasets: In order to improve the generalization ability of the model, the dataset can be expanded by collecting image data of more animal species and scenes. At the same time, you can try to use semi-supervised or unsupervised learning methods to make full use of unlabeled data.
  2. More advanced detection algorithms: With the development of deep learning technology, you can try to apply more advanced detection algorithms to animal recognition tasks to improve the accuracy and real-time performance of the model.
  3. Multi-modal information fusion: Considering that multiple modal information may be involved in the animal recognition process, such as sound, behavior, etc., it is possible to study how to integrate these information into the model to improve the recognition performance.
  4. Real-time recognition and tracking: For animal recognition and tracking tasks in real-time video streams, more efficient algorithms and techniques can be researched to reduce latency and improve tracking stability.
  5. Model deployment and optimization: In order to achieve efficient animal recognition on different platforms, technologies such as model compression and hardware acceleration can be studied to meet the needs of different scenarios.

download link

    If you want to obtain the complete and complete program files involved in the blog post (including test pictures, videos, mlx, mlapp files, etc., as shown in the figure below), they have been packaged and uploaded to the blogger’s Bread Multi-platform. See blogs and videos for reference. Package all the involved files into it at the same time, and click to run. The screenshot of the complete file is as follows:

insert image description here

    The resources under the folder are displayed as shown in the following figure:

insert image description here

Note : This code is developed using MATLAB R2022a, and it can run successfully after testing. The main program of the running interface is Detector_UI.mlapp, the test video script can run test_video.m, and the test camera script can run test_camera.m. In order to ensure the smooth running of the program, please use MATLAB2022a to run and add the following tools in the "Additional Function Manager" (the upper menu bar of MATLAB->Home->Additional Functions->Manage Additional Functions).

insert image description here

The complete resource includes data sets and training codes. For environment configuration and how to modify text, pictures, logos, etc. in the interface, please refer to the video. To download the complete file of the project, please refer to the reference blog post, or refer to the introduction of the video : ➷➷ ➷

Reference blog post: https://zhuanlan.zhihu.com/p/626659942/

Reference video demonstration: https://www.bilibili.com/video/BV1ts4y1X71R/


conclusion

        It is impossible to absolutely declare that the program is bug-free. Although we have tried our best to debug the program to ensure that no bugs are found in the current operating environment, various factors such as computer configuration, operating system, and MATLAB version may affect the operation of the program. . If you encounter problems during operation, I hope readers will think calmly, carefully check the operation process, and find solutions scientifically and rationally, and don't let impetuousness and extremes affect the enthusiasm for learning.

        Due to the limited ability of the blogger, even if the method mentioned in the blog post has been tested, it is inevitable that there will be omissions. I hope you can enthusiastically point out the mistakes, so that the next revision can be presented to everyone in a more perfect and rigorous manner. At the same time, if there is a better way to achieve it, please let me know.


references

[1] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).

[2] Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).

[3] Girshick, R. (2015). Fast R-CNN. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448).

[4] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91-99).

[5] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. In European conference on computer vision (pp. 21-37). Springer, Cham.

[6] Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).

[7] Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The Pascal visual object classes (VOC) challenge. International journal of computer vision, 88(2), 303-338.

[8] Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., … & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

[9] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., … & Berg, A. C. (2015). Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3), 211-252.

[10] Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263-7271).

[11] Redmon, J., & Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767.

[12] Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.

[13] Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3354-3361).

[14] Sindhu V S. Vehicle identification from traffic video surveillance using YOLOv4[C]//2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS). IEEE, 2021: 1768-1775.

[15] Shewajo F A, Fante K A. Tile-based microscopic image processing for malaria screening using a deep learning approach[J]. BMC Medical Imaging, 2023, 23(1): 1-14.

[16] MathWorks. (2021). Object Detection Using YOLO v2 Deep Learning. Retrieved from https://www.mathworks.com/help/vision/ug/object-detection-using-yolo-v2-deep-learning.html.

Guess you like

Origin blog.csdn.net/qq_32892383/article/details/130470598