[Computer Vision | Target Detection] Introduction to YOLO-NAS and how to use it? (with source code)

1. Introduction

Github repository:

https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS.md

1.1 Highlights

  • Referring to QARepVGG, this solution introduces QSP and QCI modules to simultaneously utilize the optimization of heavy parameters and 8-bit quantization;
  • This solution uses AutoNAC to search for the optimal size and structure of each stage, including module type, quantity and number of channels;
  • A hybrid quantization mechanism is used for model quantization, which not only considers the impact of each layer on accuracy and delay, but also considers the impact of switching between 8-bit and 16-bit on the overall delay;
  • Pre-training solution: automatically labeled data, self-distillation, and large datasets

All in all, YOLO-NAS has reached a new level in the target detection task and achieved the best accuracy-delay balance. It is worth mentioning that YOLO-NAS is fully compatible with the TensorRT inference engine and supports INT8 quantization to achieve unprecedented runtime performance.

Insert image description here

1.2 Solution introduction

Inspired by YOLOv6, YOLOv7 and YOLOv8, DECI researchers used AutoNAC to search for a better architecture than YOLOv8, that is, "We used machine learning to find a new deep learning architecture!"

Why use AutoNAC? This is because manually finding the "correct" structure is too inefficient and tedious, so DECI researchers used AutoNAC to search for new object detection models while minimizing inference latency on NVIDIA T4.

To build YOLO-NAS, the authors constructed an unfathomable search space (1014) to explore the accuracy-latency upper bound. In the end, the author constructed YOLO-NAS-S, YOLO-NAS-M, and YOLO-NAS-L from three "frontier observation points".

1.3 Introduction to training

YOLO-NAS adopts a multi-stage training method, including:

  1. Pre-training: Object365+COCO pseudo-label data
  2. knowledge distillation
  3. DFL,即Distribution Focal Loss

Deci is excited to announce the release of a new object detection model, YOLO-NAS - a game changer in object detection, delivering superior real-time object detection capabilities and production-ready performance. Deci's mission is to provide AI teams with the tools to remove development barriers and achieve efficient inference performance faster.

Insert image description here
In terms of training data, the author conducted training based on RoboFlow100 (consisting of 100 data sets from different fields) to verify its ability to handle complex detection tasks.

We demonstrate the excellent performance of YOLO-NAS on downstream tasks. When fine-tuned on Roboflow-100, our YOLO-NAS model achieves higher mAP than the nearest competitor:

Insert image description here
The architecture of YOLO-NAS uses quantization-aware blocks and selective quantization to optimize performance. When converted to the INT8 quantized version, YOLO-NAS has a smaller accuracy drop compared to other models that lose 1-2 mAP points during the quantization process (mAP of 0.51, 0.65, and 0.45 for the S, M, and L variants) point). These technologies result in an innovative architecture with superior object detection capabilities and best-in-class performance.

2. Use cases

First install the corresponding package:

%%capture
!pip install super-gradients==3.2.0
!pip install imutils
!pip install roboflow
!pip install pytube --upgrade
!pip install torchinfo

After running %%capture, the next unit of code is the command to install some Python libraries and dependencies. These commands install the required libraries via pip (Python package manager). Below is an explanation of each command:

  • !pip install super-gradients==3.2.0:This is the command to install a Python library called super-gradients, specifying the specific version number 3.2.0.
  • !pip install imutils:This is the command used to install a Python library called imutils, commonly used for image processing and computer vision tasks.
  • !pip install roboflow:This is the command used to install a Python library called roboflow, commonly used for data preprocessing and data management in machine learning and computer vision.
  • !pip install pytube --upgrade:This is the command used to upgrade a Python library called pytube, which is commonly used to download videos from YouTube.
  • !pip install torchinfo:This is the command used to install a Python library called torchinfo, which is commonly used to view details of PyTorch models.
from super_gradients.training import models

yolo_nas_l = models.get("yolo_nas_l", pretrained_weights="coco")

This code uses the training.models module in the super_gradients library to obtain a pre-trained model named yolo_nas_l and load the pre-trained weights.

from super_gradients.training import models:This line of code imports models from the training.models module in the super_gradients library.

yolo_nas_l = models.get("yolo_nas_l", pretrained_weights="coco"): This line of code calls the models.get() function to obtain the pretrained model named yolo_nas_l. The selection of the pretrained model is done by passing the string parameter "yolo_nas_l". This model seems to be a variant of the YOLO (You Only Look Once) object detection model.

pretrained_weights="coco": This is an additional parameter that specifies the pretrained weights to use. "coco" means that the model was pre-trained on the COCO (Common Objects in Context) dataset, which contains images and labels of various objects.

Ultimately, yolo_nas_l will contain a loaded pre-trained model that we can use for object detection tasks or fine-tune on top of it to fit a specific task or dataset. This model object will have pre-trained weights, which means it is trained on the COCO dataset and can be used for object detection tasks.

Insert image description here

from torchinfo import summary

summary(model=yolo_nas_l,
        input_size=(16, 3, 640, 640),
        col_names=["input_size", "output_size", "num_params", "trainable"],
        col_width=20,
        row_settings=["var_names"]
)

This code uses the summary function in the torchinfo library to generate summary information about the model structure and parameters, and uses specific parameters to configure the output format.

from torchinfo import summary:This line of code imports the summary function in the torchinfo library, which is used to generate summary information about the PyTorch model.

summary(model=yolo_nas_l, input_size=(16, 3, 640, 640), col_names=["input_size", "output_size", "num_params", "trainable"], col_width=20, row_settings=["var_names"]):This is a call to the summary function, which accepts multiple arguments to generate a model summary:

model=yolo_nas_l:This specifies that the model to be analyzed is yolo_nas_l, the model previously loaded via the super_gradients library.

input_size=(16, 3, 640, 640):This specifies the size of the input tensor, where (16, 3, 640, 640) means the batch size is 16, the number of channels is 3 (RGB image), and the image size is 640x640.

col_names=["input_size", "output_size", "num_params", "trainable"]:This is a list of strings specifying the column names of the generated summary table. The summary table will include columns for input size, output size, number of parameters, and number of trainable parameters.

col_width=20:This specifies the width of each column in the table to ensure readability of the information.

row_settings=["var_names"]:This is a list of strings specifying additional information to be displayed on each line. In this example, it is set to ["var_names"], which means that each line will display the name of the variable.

Ultimately, this code will generate a table containing summary information about the yolo_nas_l model structure and parameters. This summary information includes input and output sizes, total number of parameters, and number of trainable parameters, helping us better understand the structure and size of the model.

Insert image description here

image_path = "D:/CodeProject/Yolov8/2.jpg"
yolo_nas_l.predict(image_path, conf = 0.25).show()

This code uses a model named yolo_nas_l to perform object detection and displays an image of the detection results.

image_path = "D:/CodeProject/Yolov8/2.jpg":This line of code specifies the file path of the input image to be used for object detection. The path of the image is "D:/CodeProject/Yolov8/2.jpg".

yolo_nas_l.predict(image_path, conf = 0.25):This is the part where the model is used for object detection. yolo_nas_l is the pre-trained target detection model loaded previously. The predict function accepts the path to the input image and a conf parameter to set the confidence threshold for detection.

image_path: This is the path to the input image, telling the model which image to perform object detection on.

conf=0.25: This parameter specifies the confidence threshold, which controls which detection boxes are retained. In this case, only detection results with a confidence level greater than or equal to 0.25 will be retained.

.show(): This is the method to display the detection results after target detection. It will draw information such as detection boxes and class labels on the input image and display them.

Taken together, the function of this code is to use the yolo_nas_l model to perform target detection on the image of the specified path and display the image of the detection result. Only the detection results with a confidence level greater than or equal to 0.25 are displayed. This is a common usage for detecting objects in images and visualizing the detection results.

Insert image description here

Guess you like

Origin blog.csdn.net/wzk4869/article/details/132939774