High-precision real-time deep-sea fish target detection and recognition system based on YOLOv7 algorithm (PyTorch+Pyside6+YOLOv7)

Abstract: The high-precision real-time deep-sea fish target detection system based on the YOLOv7 algorithm can be used to detect and locate deep-sea fish targets in daily life. This system can complete target detection and recognition of input pictures, videos, folders, and cameras. At the same time, this system It also supports visualization and export of test results. This system uses the YOLOv7 target detection algorithm to train the data set, uses the Pysdie6 framework to build the desktop page system, and supports model weights such as PT and ONNX to be loaded as the system's prediction model. The functions implemented by this system include: selection and initialization of model weights; adjustment of detection confidence and post-processing IOU threshold; image import, detection, visualization of detection results and target statistics; video import, detection, visualization and detection results Target statistics; folder image batch import, detection, visualization of detection results, and target statistics; device camera import, detection, visualization of detection results, and target statistics; inference time display of single images, videos, and cameras. This blog post details the system's environment construction process, overall function introduction and demonstration. It also provides complete Python source code and usage tutorials. It is suitable for reference by newbies and supports secondary development. For the complete code and resource files of the entire system, please refer to Go to the download link at the end of the article to get it.
Insert image description here

Introduction to the principle of YOLOv7 algorithm

YOLOv7 was proposed and released in July 2022, and the paper was published at the top computer vision conference CVPR in 2023. The links and instructions for YOLOv7 are posted on the official websites of YOLOv3 and YOLOv4, which shows that YOLOv7 has been recognized by the big guys. The official version of YOLOv7 is more accurate than YOLOv5 in the same size, 120% faster (FPS), 180% faster than YOLOX (FPS), 1200% faster than Dual-Swin-T (FPS), and 550% faster than ConvNext ( FPS), 500% faster than SWIN-L (FPS). In the range of 5FPS to 160FPS, YOLOv7 exceeds the currently known detectors in both speed and accuracy, and was tested on GPU V100. The model with an accuracy of 56.8% AP can reach 30 FPS (batch=1) The above detection rate, at the same time, this is the only detector currently that can still exceed 30FPS with such high accuracy.
Paper address: https://arxiv.org/pdf/2207.02696.pdf
Source code address: https://github.com/WongKinYiu/yolov7
Insert image description here

YOLOv7 model structure

The overall structure of the YOLOv7 model is as follows. Similar to YOLOv5, the overall structure can be divided into Input, Backbone, Neck, Head and Prediction modules.
Insert image description here

This section introduces the new modules in the YOLOv7 related model:
(1) ReOrg: This module located in the yolov7-w6.yaml file
actually performs slicing operations on the input information, and is related to the PassThrough layer of the YOLOv2 algorithm and YOLOv5 (v5.0 version) ) is similar to the Focus operation. It maintains the original information of the input layer as much as possible and performs downsampling. (This part of the code is specifically located in line 48 of the models/common.py file)
(2) This part of the multi-channel convolution module
mainly reuses a large number of 1×1 point Conv and 3×3 standard Conv, and the output of each Conv Not only as the input of the next Conv, it will also perform Concat connection operations with all other Conv outputs, similar to the operations in DenseNet
(3) SPPCSPC module
. This is a module obtained by using pyramid pooling operations and CSP structures. It still contains A large number of branches; the total input will be divided into three segments into different branches. The middle branch is actually a pyramid pooling operation. The left branch is similar to depthwise Conv, but please note that the 3×3 convolution in the middle is not performed. Grouping is still a standard convolution, with a point onv on the right. Finally, the information flow output by all branches is concat vectored (this part of the code is specifically located in line 262 of the models/common.py file).
(4) RepConv module
RepVGG is a multi-branch model based on VGG network design. During the training process, the performance can be improved through multiple branches, and the inference can be converted into a continuous straight type with 3×3 convolution and ReLU through structural re-parameterization. VGG network to speed up inference (this part of the code is specifically located in line 463 of the models/common.py file).
(5) E-ELAN module
This part is a supplement to the multi-channel convolution module and is only used in larger and deeper models (yolov7-e6e model). In most papers on designing efficient networks, the main considerations are the number of parameters, the amount of computation, and the computational density. However, from the perspective of memory access, we can also analyze the impact of the input/output channel ratio, the number of branches of the architecture, and element-level operations on the network inference speed (proposed in the shufflenet paper). The activation function also needs to be considered when performing model scaling, i.e. more consideration is given to the number of elements in the output tensor of the convolutional layer. In large-scale ELAN, a stable state is achieved regardless of the gradient path length and the number of computational modules. But if more computing modules are stacked infinitely, this stable state may be destroyed and parameter utilization will be reduced. The author further proposes E-ELAN, which uses expand, shuffle, and merge cardinality structures to improve the learning ability of the network without destroying the original gradient path (specifically, it can be seen in cfg/training/yolov7-e6e.yaml that it is split into separate Structural configuration of the operator)

YOLOv7 loss function

YOLOv7 is generally consistent with YOLOv5 and is divided into three parts: coordinate loss, target confidence loss (GT is the ordinary IoU in the training phase) and classification loss. Among them, the target confidence loss and classification loss use BCEWithLogitsLoss (binary cross-entropy loss with log), and the coordinate loss uses CIoU loss. For details, see the ComputeLossOTA function in utils/loss.py.
IoU_Loss: Mainly considers the overlapping area of ​​the detection frame and the target frame.
GIoU_Loss: Based on IoU, solve the problem when the bounding boxes do not overlap.
DIoU_Loss: Based on IoU and GIoU, consider the distance information of the center point of the bounding box.
CIoU_Loss: Based on DIoU, consider the scale information of the bounding box aspect ratio.

System environment setup

(1) Open Anaconda Prompt (if the computer does not have anaconda software, you need to download and install it)
(2) Create the conda environment of yolo7 (conda create -n yolo7 python=3.8), and activate the yolo7 environment (conda activate yolo7)
(3) Enter Project directory (the directory demonstrated in this article is: E:\Pyside6_yolov7\yolov7)
(4) Install the environment dependency package: pip install -r requirements.txt
(5) Enter: python base_camera,py in the environment to open the system interface
Insert image description here
Insert image description here

System interface and function display

In this blog post, the designed software interface will be displayed. The overall interface design is simple and elegant, providing an intuitive operating experience. The main functions include the following aspects:
 Import and initialization of model weights
 Adjustment of detection confidence scores and post-processing IoU thresholds
 Information display of detected targets
 Statistics and display of detection time
 Image import, detection, visual display and export of results
 Video import, detection, visual display and export of results
 Batch import, detection, and results of images under folders Visual display and export
 Camera import, detection, visual display and export of results.
An example of the initial interface of the software is shown below:
Insert image description here

Model weight selection and initialization

Users can upload trained model weights by clicking the "Model Selection" button. Supported weight formats include .pt and .onnx. After uploading the model weights, the user can click the "Model Initialization" button to configure the initialization information of the selected model weights. In addition, users can also optimize the accuracy and speed of detection results by adjusting various parameters, such as confidence threshold (Confidence), detection post-processing threshold (IoU), etc. Changing the value in the input box below Confidence or IoU can synchronously change the progress of the slider. Changing the progress value of the slider can also change the value of the input box synchronously; changes in Confidence or IOU values ​​will be synchronized to the configuration in the model. Change the detection confidence threshold and IOU threshold. After completing all settings, users can easily start the detection process and view a visual display of the detection results. After completing the corresponding operation, the system status bar (lower right of the system) will also display the return result of the corresponding operation.
Insert image description here

Image selection, detection, display and export

Users can easily upload a single image for detection and recognition by clicking the "Image Selection" button. Next, just click the "Detect" button (the button in the lower right corner of the system, see the legend above for details), and the system will automatically complete the target detection task. During the detection process, the system will display the inference time to complete the detection in the "Detection Time" column, and display the number of detected targets in the "Number of Targets" column. The user can also select the detected target through the drop-down box and view the changes in the corresponding position information (i.e., the upper left corner x coordinate xmin, the upper left corner y coordinate ymin, the lower left corner x coordinate xmax, and the lower left corner y coordinate ymax) label value. After the detection is completed, the detection results of the input image will be displayed on the right side of the system.
If the user wants to save the test results, he can click the "Export Image Result" button, and then enter the save file name and suffix (such as 1.jpg) in the pop-up dialog box to save the test result image.
When the user clicks the end button (the button in the lower right corner of the system, see the legend above for details), the system will exit the current detection task and refresh the interface, clearing all output information. In addition, users can continue to click the "Image Selection" or "Video Selection" button to upload images or videos for corresponding detection and recognition. In short, this system provides users with a simple and easy-to-use interface, allowing users to quickly complete image detection tasks and conveniently view and export detection results. Specific operation examples are shown in the figure below.
Insert image description here

Video selection, detection, display and export

Users can click the "Video Selection" button to upload videos for detection and identification. Next, the user only needs to click the "Detect" button (the button in the lower right corner of the system, see the legend above for details), and the system will automatically complete the video target detection task. During the detection process, the system will display the inference time of single-frame target detection in the "Detection Time" column, and display the number of targets detected in a single frame in the "Number of Targets" column. At the same time, the system uses a progress bar to visually display the current detection schedule. The user can also select the detected target through the drop-down box and view the changes in the corresponding location information (ie, the upper left corner x coordinate xmin, the upper left corner y coordinate ymin, the lower left corner x coordinate xmax, and the lower left corner y coordinate ymax) label value. After the detection is completed, the detection results of the input video will be displayed on the right side of the system.
In order to facilitate users to pause and observe the video detection results, the system provides a "pause" button (the button in the lower right corner of the system, see the legend above for details). After the user clicks, the system will pause the video detection. At this time, the user can select the coordinate position information of the detected target through the drop-down target box, and then click the "Continue" button (the button in the lower right corner of the system, see the legend above for details) to achieve this. Continuation detection of input video.
If the user wants to save the video detection results, he can click the "Video Result Export" button, and then enter the save file name and suffix (such as 2.mp4) in the pop-up dialog box to save the detection result video. When the user clicks the "End" button (the button in the lower right corner of the system, see the legend above for details), the system will exit the current video detection task and refresh the interface, clearing all output information.
Insert image description here

Batch image import, detection, display and export of folders

Users can easily upload batch images by clicking the "Folder" button. Next, just click the "Detect" button (the button in the lower right corner of the system, see the legend above for details), and the system will automatically complete the target detection task and detect the pictures in all folders. During the detection process, the system displays the inference time to complete the detection in the "Detection Time" column, and displays the number of detected targets in the "Target Number" column. At the same time, the system uses a progress bar to visually display the current detection progress. The user can also select the detected target through the drop-down box and view the changes in the corresponding position information (i.e., the upper left corner x coordinate xmin, the upper left corner y coordinate ymin, the lower left corner x coordinate xmax, and the lower left corner y coordinate ymax) label value. After the detection is completed, the detection results of the input image will be displayed on the right side of the system.
If the user wants to save the test results in batches, he can click the "Folder Export" button and then select the output folder in the pop-up dialog box to save the batch test result images. When the user clicks the end button (the button in the lower right corner of the system, see the legend above for details), the system will exit the current detection task and refresh the interface, clearing all output information. Specific operation examples are shown in the figure below.
Insert image description here

Camera detection, display and export

The user can start the camera device by clicking the "Camera On" button (default starts the first camera of the local device). Next, the user only needs to click the "Detect" button (the button in the lower right corner of the system, see the legend above for details), and the system will automatically complete the camera target detection task. During the detection process, the system will display the time consumed in the "Detection Time" column and the number of detected targets in the "Target Quantity" column. The user can also select the detected target through the drop-down box and view the changes in the corresponding position information (i.e., the upper left corner x coordinate xmin, the upper left corner y coordinate ymin, the lower left corner x coordinate xmax, and the lower left corner y coordinate ymax) label value.
If the user wants to save the camera detection results, he can click the "Camera Export" button, and then enter the save file name and suffix (such as 22.mp4) in the pop-up dialog box to save the camera detection result video. When the user clicks the "End" button (the button in the lower right corner of the system, see the legend above for details), the system will exit the current camera detection task and refresh the interface, clearing all output information. In short, this system provides users with a simple and easy-to-use interface, allowing them to quickly complete camera inspection tasks and conveniently view and export inspection results.
Insert image description here

Dataset introduction

The deep-sea fish data set used by this system is manually labeled with the category of deep-sea fish, and the data set has a total of 6517 images. The categories in this dataset all have a large number of rotations and different lighting conditions, which helps to train a more robust detection model. The deep-sea fish detection and recognition data set experimented in this article contains 5362 images in the training set and 1155 images in the verification set. In order to better display the distribution of the data set, some data samples from the validation set are selected as shown in the figure below. As can be seen from the picture, the targets in the data set have rich diversity, which will help the model learn more robust features. At the same time, in order to further improve the generalization ability and robustness of the model, we also use data enhancement technology. Data enhancement includes random rotation, scaling, cropping, and color transformation, which can expand the data set while reducing the risk of overfitting. Through these operations, we expect that the model can better adapt to different scenarios and perform better in practical applications.

Key code analysis

This system uses PyTorch to implement the target detection algorithm and performs target detection based on the YOLOv7 algorithm. In the training phase, we used the pre-trained model as the initial model for training, and then optimized the network parameters through multiple iterations to achieve better detection performance. During the training process, we adopted techniques such as learning rate decay and data enhancement to enhance the generalization ability and robustness of the model. In order to better evaluate the performance of the model, we conducted extensive experiments on the training set and validation set. By adjusting hyperparameters such as learning rate, batch size, etc., we finally found a suitable parameter setting for this task. At the same time, in order to improve the generalization ability of the model, we also use data enhancement techniques, such as random rotation, scaling, cropping, and color transformation, to expand the data set while reducing the risk of overfitting.
During the testing phase, we used the trained model to detect new images and videos. By setting a threshold, detection frames with confidence lower than the threshold are filtered out, and the detection results are finally obtained. At the same time, we can also save the detection results in image or video format for subsequent analysis and application. This system is based on the YOLOv7 algorithm and implemented using PyTorch. The main libraries used in the code include PyTorch, NumPy, OpenCV, Pyside6, etc. Some key codes implemented in this system are shown in the figure below.
Insert image description here

Pyside6 interface design

PySide6 is a free Python cross-platform GUI library. It is a binding library for Python and is used to develop cross-platform GUI applications. PySide6 is a next-generation Python cross-platform GUI library based on Qt5 and the PyQt5 library, which provides developers with a powerful toolset to build cross-platform user interfaces. The main goals of PySide6 are to improve performance, simplify developers' work, and provide a better user experience. The main features of PySide6 include:
Cross-platform support: PySide6 supports multiple platforms such as Windows, MacOS and Linux, and can easily develop cross-platform GUI applications.
High performance: PySide6 adopts the latest technology of Qt5 and PyQt5 library, providing developers with higher performance.
Easy to use: PySide6 provides rich APIs and tools, allowing developers to quickly develop GUI applications without excessive coding.
Extensibility: PySide6 supports a variety of GUI components and controls, making it easy to extend and customize the application's user interface.
Community support: PySide6 has an active community that provides rich documentation and sample code to help developers get started quickly.

Overall, PySide6 is a powerful Python cross-platform GUI library that provides developers with a simple and easy-to-use toolset to build cross-platform user interfaces. PySide6's performance, scalability, and community support make it a great library for developers to use.

Experimental results and analysis

In the experimental results and analysis section, we use indicators such as precision and recall to evaluate the performance of the model, and also analyze the training process through the loss curve and PR curve. In the training phase, we used the YOLOv7 algorithm to train the data set for a total of 300 epochs. As can be seen from the figure below, as the number of training times increases, the training loss and verification loss of the model gradually decrease, indicating that the model continues to learn more accurate features. After training, we used the model to evaluate on the validation set of the dataset and got the following results.
Insert image description here

The figure below shows the PR curve of the YOLOv7 model we trained on the verification set. As can be seen from the figure, the model has achieved high recall and precision, and the overall performance is good.
Insert image description here

In summary, the YOLOv7 model trained in this blog post performs well on the data set, has high detection accuracy and robustness, and can be applied in actual scenarios. In addition, this blogger conducted detailed tests on the entire system and finally developed a smooth and high-precision target detection system interface, which is shown in the demonstration part of this blog post. The complete UI interface, test pictures and videos, code files, etc. have all been packaged and uploaded. , interested friends can follow my private message to get the download link. In addition, please pay attention to the author's WeChat public account BestSongC for the PDF of this blog post and more target detection and recognition systems (Currently, the system interface developed based on the YOLOv5 algorithm and YOLOv8 algorithm has been released, as well as the target detection algorithm improvement series).

Other target detection systems based on deep learning include tomatoes, cats and dogs, goats, wild targets, cigarette butts, QR codes, helmets, traffic police, wild animals, wild smoke, human fall recognition, infrared pedestrians, poultry pigs, apples, bulldozers, Friends who are in need of bees, phone calls, pigeons, football, cows, human face masks, safety vests, smoke detection systems, etc. follow me and get download links from other videos of the blogger.

The complete project directory is as follows:
Insert image description here

Guess you like

Origin blog.csdn.net/sc1434404661/article/details/134961348